Jump to content

Working with php's DOM functions


Arty Ziff

Recommended Posts

I'm using php's DOM functions to strip some information out of a block of HTML:

for ($i = 0; $i <= $tot_tblocks-1; $i++) {
// Load the HTML blocks...
$dom = new DOMDocument();
$dom->loadHTML($tblock[$i]);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[@class="desc"]/h2[@class="name"]');
        // Get the part I want...
foreach ($tags as $tag) {
	$tname[$i] = trim($tag->nodeValue);
	echo $tname[$i]."<br>";
}
}

Two questions:

 

1 - There is actually ever only one item ("name"), can I access it without the foreach loop? $tname[$i] = trim($tags->nodeValue); doesn't work.

 

2 - This code extracts content between tags of certain class names. But I would also like to extract the values of certain attributes of some of those tags, such as - perhaps - the value of the href in a <a> tag. The tag may not have unique class name, but I could still get an array from all the <a> tags in the source block? But I don't know how, and haven't been successful in deciphering the documentation... Any ideas?

Link to comment
Share on other sites

1. Apparently xpath() always returns an array. You can use current to get the first one, like

$tag = current($xpath->query(...));

 

2. You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name... Or does that stuff happen outside the code you posted?

What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside.

Link to comment
Share on other sites

...You sure that's what it does? Looks to me like it gets the text inside every div.desc>h2.name...
Yes, that's exactly what it does (of which there is only one occurrence in the HTML block being parsed...)

What does the HTML look like? A solution is (probably) to use more XPath queries, unless you know the hierarchy of the HTML and where the A nodes fall inside.

The hierarchy is known, but it could change. But the tags (for the most part) have unique class names.

 

Dcr0 - Works great. Thanks!

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.