Jump to content

searching with more than one word in a list of directories


farha

Recommended Posts

Hello there,

 

I am pretty new to php and I was wondering if anyone can give me some input. Basically I am creating a search engine where I can search for words in certain files. I am trying to search using two or more words for instance a file may have the sentence 'the cat ran across the road', and another file has 'the road is long'. I would like to search 'cat road', and have these two files show up with each word highlighted. At the moment 'cat road' is being searched as just one string.

 

I have exploded the keyword, and i have broken the array into separate words but for some reason it is still not searching. Again I would appreciate any feedback.

 

Just to show you a bit of my code to explain:

 

 

if(!empty($_GET['keyword']))

   

{

$user_input = trim($_GET['keyword']);

$user_input = preg_replace("/ {2,}/", " ",$user_input); 

        $keyword = explode(" ", $user_input);

for($i=0;$i<count($keyword);$i++)

{

echo("$keyword[$i]\n");

$keyword = $_GET['keyword'];

 

}

 

Thanks for your time,

 

Farha

Link to comment
Share on other sites

First thing, when replacing multiple spaces I like to use something like:

 

$user_input = preg_replace("/\s+/", " ", $user_input);

Same thing, just a bit neater. :)

 

Back to the main question. For each file you're going to either loop over each keyword, or for each keyword you're going to loop over each file. It's a matter of determining which would allow better results. Hard to say, but I'd probably try looping over each keyword for every file. So...

 

$files = glob("path/to/files/*");
$keywords = preg_replace("/\s+/", " ", trim($user_input));
$keywords = explode(" ", $keywords);

foreach($files as $file){
    $contents = file_get_contents($file);
    foreach($keywords as $word){
        // search the files $contents for $word here and store in a multi-dimensional array
    }
}

 

Now if you're after all occurances and the position of each inside the file you'd need to use strpos(), and make use of the 3rd parameter which allows an offset to where you start the search. You'd need to store each location inside an array that relates to that file, so you can highlight each match. You'll end up with some pretty big multi-dimensional arrays depending on the search and amount of files.

 

If you need more help let us know.

 

Good luck.

Link to comment
Share on other sites

Thanks very much for the feedback.

 

I have basically managed to have it search through each keyword for every file. The problem with this is i get the same file coming back to me twice showing me each keyword in each file. I was hoping to get one result for this with the two keywords that are in the one file. (If that makes sense).

 

So for example I search Lorem ipsum - I get the same file 'lorem.txt' but two results one showing me where lorem is in the text file and one showing me where ipsum is. I was basically just hoping for one result with it showing me where 'lorem' is and where 'ipsum' is in that file.

 

I still haven't figured out how to use the strpos and so instead im using stristr.

 

i have pasted my code below - so that you can see what ive done.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="Content-Language" content="en-us" />
<style type='text/css'>
<!--
	body {font-family: Verdana, Helevetica, Sans-Serrif; font-size: 12px; width: 900px; margin: 0px auto;}
	code {font-family: Verdana, Helevetica, Sans-Serrif; font-size: 12px; color: #336699;}
-->
</style>
    <title>Untitled</title>
</head>

<body>
<h1>PHP Search</h1>
<form name='searchform' id='searchform' action='<?php echo $_SERVER['PHP_SELF']; ?>' method='get'>
	<input type='text' name='keyword' id='keyword'/>
	<input type='submit' value='submit' />
</form>
</body>
</html>


<?php

if(!empty($_GET['keyword'])){

$files = glob("data/*");
$user_input = preg_replace("/\s+/", " ", $_GET['keyword']);      
$keywords = preg_replace("/\s+/", " ", trim($user_input));
$keywords = explode(" ", $keywords);


$hits = null;
$full_url = $_SERVER['PHP_SELF'];
$site_url = eregi_replace('index.php', '', $full_url);
$directory_list = array('data');


foreach($directory_list as $dirlist)
{

    $directory_url = $site_url.$dirlist."/";
    $getDirectory = opendir($dirlist);

    while($dirName = readdir($getDirectory)){
        $getdirArray[] = $dirName;
	}

    closedir($getDirectory);
    $dirCount = count($getdirArray);
    sort($getdirArray);

 	for($dir=0; $dir < $dirCount; $dir++)
    {
        if (substr($getdirArray[$dir], 0, 1) != ".")
        {
            $label = eregi_replace('_', ' ', $getdirArray[$dir]);
            $directory = $dirlist.'/'.$getdirArray[$dir]."/";
            $complete_url = $site_url.$directory;

            if(is_dir($directory))
            {
               $myDirectory = opendir($directory);
                $dirArray = null;

                while($entryName = readdir($myDirectory)){
                    $dirArray[] = $entryName;
				}
                closedir($myDirectory);

                $indexCount = count($dirArray);
                sort($dirArray);
            }
	        else
	        {
	            $hits++;

	            if(file_exists($dirlist."/".$label))
	            {
					$fd=fopen($dirlist."/".$label,"r");
		            $text=fread($fd, 50000);

					foreach ($keywords as $keyword )
					{
	                    $do=stristr($text, $keyword);

	                    if($do)
	                    {
	                        $strip = strip_tags($text);

	                        echo "<span>";
	                        if(preg_match_all("/((\s\S*){0,3})($keyword)((\s?\S*){0,3})/i", $strip, $match, PREG_SET_ORDER));
	                        {

	                            $number=count($match);

	                            if($number > 0)
	                            {
	                                echo "<a href='".$dirlist."/".$label."'>".$label."</a> (".$number.")";
	                                echo "<br />";
	                            }

	                            for ($h=0;$h<$number;$h++)
	                            {
	                                if (!empty($match[$h][3]))
	                                {
	                                    printf("<i><b>..</b> %s<b>%s</b>%s <b>..</b></i>", $match[$h][1], $match[$h][3], $match[$h][4]);
	                                }
	                            }
	                            echo "</span><br />";

	                            if($number > 0):
	                                echo "<hr />";
								endif;
	                        }
	                    }
					}
				}
	        }

	 	}

	}

}
}


?>

 

Link to comment
Share on other sites

I haven't tested this theory, but I've got a feeling you'll need to trim() the values in the arrays and the user input, and possibly use a case conversion such as strtolower().

 

I am not sure how to use case conversion - but even if i search in upper case, or add lots of spaces after it still returns the correct words i look for. this should be ok right?

Link to comment
Share on other sites

 

So for example I search Lorem ipsum - I get the same file 'lorem.txt' but two results one showing me where lorem is in the text file and one showing me where ipsum is. I was basically just hoping for one result with it showing me where 'lorem' is and where 'ipsum' is in that file.

 

 

i'm having a problem with correctly looping it.

Link to comment
Share on other sites

Just some thoughts...

<?PHP
/* the search words */
$words1 = array ("rum", "wine", "oil", "Rum", "Wine", "Oil");
$c = count($words1);
$i=0;

/* remove spaces then add spaces */
$i = 0;
while($i<$c) {
$words1[$i] = " " . trim($words1[$i]) . " ";
$words2[$i] = " <font color=#ff0000><b><u><i>" . trim($words1[$i]) . "</b></u></i></font> ";
$i ++;
}

/* create the search function */
function IsItThere($file,$word) {
	if(stristr($file, $word) === FALSE) {
		// string needle NOT found in haystack
		return "bad";
	}
}

/* set dir to read */
$dir = "words/*"; 

$y=0;

/* begin looping thru the files */
foreach(glob($dir) as $file) { 
$good = "good";
$file_content = file_get_contents($file);
$i=0;
while($i<$c) {
	$wword = $words1[$i];
	$test = IsItThere($file_content,$wword);
	if($test == "bad") {
		$good = "bad";
	}
	$i ++;
}
if($good == "good") {
	$good_files[$y] = $file_content;
	$y ++;
}
} 
/* if there are any good files loop thru them highlighting the words and display the file content */
$num_files = count($good_files);
$i=0;
while($i<$num_files) {
$good_files[$i] = str_replace($words1, $words2, $good_files[$i]);
echo $good_files[$i] . "<hr>";
$i ++;
}

?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.