Jump to content

Word between tabs


Satanas

Recommended Posts

Hi guys!

 

I'm trying to get some words between tabs but with no result...

 

Here an example... I want to get the country name there... but because of tab spaces I'm having no result...

I've tryed the \s ... and \t ... and \n

 

 

		<h3>00CA - Goldstone (GTS)</h3>
		United States 
	</td>

 

Any help?

Thanks.

Link to comment
Share on other sites

With such a small context provided...

 

<pre>
<?php
$data = <<<DATA
		<h3>00CA - Goldstone (GTS)</h3>
		United States 
	</td>
DATA;

preg_match('%</h3>(.+?)</td>%s', $data, $matches);
print_r($matches);
?> 
</pre>

 

Hi there effigy!

 

First of all... thanks for your help...

Sorry for the small context provided, what you will need more to help me?

 

Thanks,

 

Link to comment
Share on other sites

The code I provided does not work?

 

Nope. Where's what I'm trying to do...

 

I've a database where users where I need to update the countrys.

I've access to an internet page where is the country I want to get so...

 

$user_id = 544;
$texto = file_get_contents("http://www.mydomain.com/users.php?uid=$user_id");

        preg_match('%</h3>(.+?)</td>%s', $texto, $matches);
        print_r($matches);

 

The code you provided gives me the all page contents... not only the countrys.

 

???

 

Thanks once more.

Link to comment
Share on other sites

effigy, I gave your code snippet a shot and it worked (I typically echo out $matches[0] though).

I do have one question..

 

preg_match('%</h3>(.+?)</td>%s', $data, $matches);

 

I noticed the .+? segment.

From what I read here:

http://www.regular-expressions.info/reference.html

 

the explaination is:

'Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.' but I lookat the example given, and I'm still not sure I follow..

 

Can you give another simple example of when it needs to increase matches through further permutations? This would be much appreciated. I'm slightly confused by this.

 

Cheers,

 

NRG

Link to comment
Share on other sites

You can see the problem greediness creates by adding more data and modifying the pattern:

 

<pre>
<?php
$data = <<<DATA
	<td>
		<h3>00CA - Goldstone (GTS)</h3>
		United States 
	</td>
	<td>
		<h3>00CA - Goldstone (GTS)</h3>
		United States 
	</td>
DATA;

preg_match('%</h3>(.+)</td>%s', $data, $matches);
print_r($matches);
?> 
</pre>

 

. plus the /s modifier is going to match anything. When greedy, it has no concern for following patterns until it is done. Therefore, (.+) is going to match the rest of the string, then come to </td> and realize that it needs to give away its matches one by one (backtrack) in order to try and finish the match. Effectively, this means that the last </td> that was gobbled by (.+) is going to match.

 

Laziness, on the other hand, is going to take one, then make sure it's not taking from the following pattern, then repeat the process. For example, (.+?) takes the "U" then makes sure "</td>"* isn't next; it's not, so it grabs the "n", checks, then the "i", checks, and so forth, all the way up to the tab before "</td>".

 

* Actually, it's only going to make sure "<" isn't next. If it is, then it would it would look for "/" and so forth. The same applies throughout: "</td>" is not an atomic unit as far as the regex is concerned. It deals with the characters one at a time.

Link to comment
Share on other sites

Thanks for the response, effigy.

 

I think I understand now (although, admittedly, using the 'here document' and HTML sample with tags might not be the best example as tags are still parsed by the browser).

So if I understand correctly (and feel free to correct me if I'm wrong)..

 

In your last code snippet, when only using (.+) (which is greedy), the match is as follows after the initial </h3>? (don't mind the improper spacing / formatting here...)

 

   United States
</td>
<td>
   <h3>00CA - bluestone (GTS) 
   United States

 

If this is correct, I suppose due to browsers parsing the HTML tags, we only see the following onscreen (which is what I got):

 

United States
00CA - bluestone (GTS)
United States

 

But.. when using (.+?) 'Lazy', the expression (stops?) once it finds the first occurrence: So after the first </h3>, the system finds simply:

 

United States

 

Since the first condition is met, it doesn't matter what is in the second (otherwise) match of the pattern, as the expresison is now lazy and only finds the first occurrence.

Do I got this right?

 

To put it in another example (not using here document or HTML tags):

 

$str = 'there\'s no place like home, as there\'s only one place to call home.';
preg_match('#there\'s(.+)home#', $str, $match);
foreach($match as $val){
   echo $val . '<br />';
}

 

ouputs (as an array with two keys / values):

there's no place like home, as there's only one place to call home <-- this is $match[0]
no place like home, as there's only one place to call <-- this is $match[1]

 

And this is because of the greedy nature (lack of the question mark character), it starts from the first "there's" and matches up to the second "home" and thus includes everything inbetween.

 

But with the (.+?) in use:

preg_match('#there\'s(.+?)home#', $str, $match);

 

I get:

there's no place like home <-- this is $match[0]
no place like <-- this is $match[1]

 

Since it is lazy, it only matches the first occurrence between "there's" and "home" (the first home that is).

On a side note, I didn't realise that you can match a section of characters doing it this way ($match[1]). Prior to this post, I would have thought that one would need to use positive look behind assertions and positive look ahead assertions to exclude the words "there's" and "home".. but as it turns out, due the (.+?) being in parenthesis, this match is put into another key.

 

This is an eye opener.. makes me see things a little differently now. Hope I got all this right.

 

Cheers,

 

NRG

Link to comment
Share on other sites

  • Correct. Although, I want to clarify what you mentioned about the expression stopping. Yes, the laziness portion stops matching data when it is fulfilled and the following expressions (if any) are sufficed, but the expression as a whole matches only once (stops) because this is the behavior of preg_match. One must use preg_match_all to match every instance of the pattern.
     
  • Adding this before print_r should be helpful:

    foreach ($matches as &$match) {
    $match = htmlspecialchars($match);
    }

     

  • Per the docs, index 0 is the full match, while indexes 1 and above are the individual parenthetical captures.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.