Author Topic: highlighting search terms  (Read 13538 times)

0 Members and 1 Guest are viewing this topic.

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
highlighting search terms
« on: January 17, 2007, 02:56:41 PM »
well, I started this in the regular PHP section, but it no longer fits there. Suffice it to say, I'm trying to take individual search terms that are being $_POSTed and highlighting them in the search results.

The Original Post talked about using str_replace to handle this. New problem, though, when the same search terms show up inside a HTML tag (like <img src="search_term">).

I'm trying "/\b(?!<.+?>)search_term\b/" -- but it's still finding "search_term" inside <img src="search_term">.

Thanks!

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: highlighting search terms
« Reply #1 on: January 17, 2007, 03:45:32 PM »
Either separate the tags from the content and process, or just analyze the non-tagged content:

Code: [Select]
<pre>
<?php
$tests = array( 
'<img src="search_term">',
'<a>search_term</a>',
'<a>Xsearch_termX</a>',
);

$term 'search_term';
echo "Searching for <b>$term</b>...<br>";
foreach ($tests as $test) {
echo htmlspecialchars($test), ' => ';
$test preg_replace_callback(
'/>(.+?)</',
create_function(
'$matches',
'return preg_replace("/\b(' preg_quote($term) . ')\b/", "<b>\\\1</b>", $matches[0]);'
),
$test
); 
echo htmlspecialchars($test), '<br>';
}
?>

</pre>

« Last Edit: January 17, 2007, 03:46:58 PM by effigy »
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #2 on: January 17, 2007, 07:21:28 PM »
Wow, that's intense. All kinds of functions I've never heard of. In fact after reading the manual page on create_function(), I still don't understand it. $matches doesn't exist anywhere outside of create_function()???

Anyway, it's doing the same thing my original preg_replace() was doing (finding matches inside tags). Here's the new function, minus the array stuff:
Code: [Select]
<?php
$test
="<table>\r\n<tr>\r\n\t<td><img src=\"somestring.jpg\" alt=\"\"></td>\r\n</tr>\r\n<tr>\r\n\t<td>somestring</td>\r\n</tr>\r\n</table>\r\n";
$term 'somestring';
$test preg_replace_callback(
'/>(.+?)</',
create_function(
'$matches',
'return preg_replace("/\b(' preg_quote($term) . ')\b/", "<span style=\"background:#FF0;\">\\\1</span>", $matches[0]);'
),
$test
); 
echo 
$test;
?>

and the output

Code: [Select]
<table>
<tr>
        <td><img src="<span style="background:#FF0;">somestring</span>.jpg" alt=""></td>
</tr>
<tr>
        <td><span style="background:#FF0;">somestring</span></td>
</tr>
</table>

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #3 on: January 17, 2007, 07:28:47 PM »
hey, check this out...
Code: [Select]
<?php
$test
="<table>\r\n<tr>\r\n\t<td><img src=\"somestring.jpg\" alt=\"\"></td>\r\n</tr>\r\n<tr>\r\n\t<td>somestring</td>\r\n</tr>\r\n</table>\r\n";
preg_match("/>(.+?)</",$test,$new_array);
print_r($new_array);
?>

result:
Code: [Select]
Array
(
    [0] => ><img src="somestring.jpg" alt=""><
    [1] => <img src="somestring.jpg" alt="">
)

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: highlighting search terms
« Reply #4 on: January 17, 2007, 07:45:21 PM »
Ah, of course. Change />(.+?)</ to />(.*?)</.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #5 on: January 17, 2007, 09:01:56 PM »
First, let me say it works great!

However, I don't understand why />(.*?)</ is returning ">somestring<" instead of just "somestring" ???

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: highlighting search terms
« Reply #6 on: January 17, 2007, 09:51:34 PM »
Where? The entire match includes the angle brackets, but the 1st capture does not. It depends on which part you're using.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #7 on: January 17, 2007, 09:59:37 PM »
here's what />(.*?)</ matches in the example:
Code: [Select]
    [0] => Array
        (
            [0] => ><
            [1] => ><
            [2] => > <
            [3] => >somestring<
            [4] => > <
        )

    [1] => Array
        (
            [0] =>
            [1] =>
            [2] => 
            [3] => somestring
            [4] => 
        )

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: highlighting search terms
« Reply #8 on: January 17, 2007, 10:13:25 PM »
A better alternative might be /(?<=>)([^<]+)/.
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #9 on: January 17, 2007, 11:31:21 PM »
Dude, it's magic. One more thing that is not quite working right: MySQL is returning case insensitive results. Is there any way to make '/\b(' preg_quote($newstring) . ')\b/' case insensitive?

Also, if you're willing to educate, I'm struggling to follow this part of the code. (?<=>) is a lookbehind for ">"? It's working perfectly to locate the text I'm searching for -- even text that isn't preceeded by ">". I don't get it, is lookbehind optional?

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: highlighting search terms
« Reply #10 on: January 18, 2007, 01:03:29 AM »
Yes, put an "i" after the closing delimiter: /pattern/i

You are correct about the lookbehind; however, they are not optional (by default). Why do you think it is matching text that isn't preceded by ">"?
« Last Edit: January 18, 2007, 01:05:10 AM by effigy »
Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #11 on: January 18, 2007, 08:37:35 AM »
It's finding the search term, even after newlines. So, I guess (?<=>) evades newlines, too?

Offline effigy

  • Staff Alumni
  • Freak!
  • *
  • Posts: 7,301
  • Gender: Male
  • We must be the change we wish to see in the world.
    • View Profile
Re: highlighting search terms
« Reply #12 on: January 18, 2007, 10:24:06 AM »
([^<]+) is capturing the CRs and NLs, and (?<=>) is still anchoring at ">". Observe:

Code: [Select]
<pre>
<?php
$test "<table>\r\n<tr>\r\n\t<td><img src=\"somestring.jpg\" alt=\"\"></td>\r\n</tr>\r\n<tr>\r\n\t<td>somestring</td>\r\n</tr>\r\n</table>\r\n";
preg_match_all('/(?<=>)([^<]+)/'$test$matches);
$replace = array(
"\n" => '\n',
"\r" => '\r',
);
foreach ($matches as &$array) {
foreach ($array as &$match) {
$match preg_replace('/([\r\n])/e''$replace["\1"]'$match);
}
}
print_r($matches);
?>

</pre>

Regexp | Unicode Article | Letter Database
/\A(e)?((1)?ff(?:(?:ig)?y)?|f(?:ig)?)\z/

Offline michaellunsfordTopic starter

  • Devotee
  • Posts: 1,029
  • Gender: Male
  • wondering
    • View Profile
    • Virtual ShowCase
Re: highlighting search terms
« Reply #13 on: January 18, 2007, 10:32:07 AM »
ahhh... I should have known that :-[ all this late night programming must be killing brain cells :)

Offline fataqui

  • Irregular
  • Posts: 5
    • View Profile
Re: highlighting search terms
« Reply #14 on: December 31, 2007, 06:03:07 AM »
I do like this...

<?php

	
$arrayofwords = array ();
	
$arrayofwords[0] = "This";
	
$arrayofwords[1] = "text";
	
$arrayofwords[2] = "need";
	
$arrayofwords[3] = "words";

	
$str 'This is my <img src="" title="This image text"> long text <a href="#">words</a> where I need to highlight words in the HTML text.';

	
$str preg_replace "/(?!(?:[^<]+>|[^>]+<\/a>))\b(" implode '|'$arrayofwords ) . ")\b/is""<strong>\\1</strong>"$str );

	
echo 
$str;

?>

Offline lwc

  • Irregular
  • Posts: 35
    • View Profile
Re: highlighting search terms
« Reply #15 on: July 28, 2008, 07:32:40 PM »
Of course, your code only works in sites with strictly Latin words. Otherwise, see this.

Offline ddrudik

  • Enthusiast
  • Posts: 78
    • View Profile
    • myregextester.com
Re: highlighting search terms
« Reply #16 on: October 28, 2008, 10:43:03 PM »
I would suggest:
search_term(?!(?=[^<>]*>))

Offline ddrudik

  • Enthusiast
  • Posts: 78
    • View Profile
    • myregextester.com
Re: highlighting search terms
« Reply #17 on: October 30, 2008, 04:43:26 AM »
Also, if you want to exclude search_term from within head/script/a blocks as well as from within tags:
Code: [Select]
$html=preg_replace_callback('~(<head>.*?</head>|<script\s[^>]*>.*?</script>|<a\s[^>]*>.*?</a>)|search_term(?!(?=[^<>]*>))~is',create_function('$matches','return isset($matches[1]) ? $matches[1] : "<strong>$matches[0]</strong>" ;'),$html);


Offline lwc

  • Irregular
  • Posts: 35
    • View Profile
Re: highlighting search terms
« Reply #18 on: December 16, 2008, 02:16:49 PM »
I do like this...
I like this too. But can you deal with "&lt;" and "&gt;" too?