Jump to content

DOMDocument and Xpath; running xpath queries to select child nodes?


Prismatic

Recommended Posts

So here's what I'm trying to do, and I haven't found any clear tutorials on how to properly navigate a DOMDocument object, at least not in the strict sense of PHP.

 

I'm building a web scraper, I've had it working for some time now using more traditional methods (a combination of string manipulation and clever regex). I've been told xpath can be much faster and more reliable for what I need. Sold.

 

Let's say I'm parsing a forum. This forum separates each reply in a post with a set of <li></li> with a class of "message"

 

<li class="message">
    // Stuff here
</li>

<li class="message">
    // Stuff here
</li>

 

So far so good. These list items contain all the formatting for each post, including user info and the message text. Each sitting in it's own div.

 

 

<li class="message">
<div class="user info">
	User info here
</div>

<div class="message text">
	Message text here	
</div>
</li>

<li class="message">
<div class="user info">
	User info here
</div>

<div class="message text">
	Message text here	
</div>
</li>

 

Still with me? Good.

 

With this bit of code I can select each message list item block and iterate over all the sub nodes inside.

 

$items = $xpath->query("//li[starts-with(@class, 'message')]");

for ($i = 0; $i < $items->length; $i++) {
   echo $items->item($i)->nodeValue . "\n";
}

 

 

This produces a basic text dump of the entire forum. Close, but not what I need.

 

What I'm trying to do is as follows

 

  • Select all the class="message" list items [done]
  • Once those have been selected, run another $xpath->query to select the child nodes which contain the user info and message text

 

Step one is done, step two is what is confusing me. How can I run a new query based on the output from the first query?

 

Thanks guys

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.