SimpleXML, CDATA and HTML entities

tomccabe · November 18, 2010

I'm tearing my hair out trying to work with "simple" XML for the first time. I'm building a small CMS for a Flash based site and the content is held in an XML file. My problem is that many of the copy fields are XML CDATA fields. On the one hand, with:

$xml = simplexml_load_file($file, 'SimpleXMLElement', LIBXML_NOCDATA);

I can pull the data out of that node and the CDATA tags are stripped. My issues come with trying to save the data with:

file_put_contents($file, $xml->asXML());

Problems are:

a) tags are interpreted with their HTML entity equivalents. I don't want this to happen as it's a CDATA field being read by Flash. I gather this is coming from the asXML method because even if I do an html_entity_decode on the $_POST data it's still being converted.

b) because of the above, there's no way to add the CDATA tags because they also have their charachters converted.

SimpleXML so far has been anything but simple for me >:( Has anyone ever run into this? Is there a way to save the file back in a way that will just keep my data exactly as is and also allow me to add the CDATA tags back in?

salathe · November 18, 2010

Don't use LIBXML_NOCDATA as that merges CDATA content into XML text nodes, which is why you see the "HTML entity equivalents".

tomccabe · November 18, 2010

Don't use LIBXML_NOCDATA as that merges CDATA content into XML text nodes, which is why you see the "HTML entity equivalents".

If I don't use that option none of the text from the CDATA sections is pulled out on the load.

salathe · November 18, 2010

Can you show us an example showing that none of the CDATA text is available?

salathe · November 18, 2010

Also, older SimpleXML versions didn't like to play nicely with CDATA so it might be work looking into the DOM classes.

tomccabe · November 18, 2010

Also, older SimpleXML versions didn't like to play nicely with CDATA so it might be work looking into the DOM classes.

Yup, I've been looking at this, but it's all very confusing. I have a deadline of today and the only thing I can't get to function is the CDATA section. Starting to freak out because most of my searching leads me down wildly different paths, some with "createCDATASection", some using an XPath object, some traverse the document with getElementByTagName...I am able to echo the CDATA section I want to be able to edit, but can't seem to get removeChild to work (granted I'm trying to absorb all of this at a rapid pace). Here's what I have so far:

$doc = new DomDocument();
$file = "spice.xml";
$doc->load($file);
$xPath = new DomXPath($doc);
$homesCopy 	= $xPath->query("//sections/homesCopy");
echo $homesCopy->item(0)->nodeValue;

I get the tags in the CDATA section to output as html here which is great. Now if only I could edit it :shrug:

salathe · November 18, 2010

... most of my searching leads me down wildly different paths, some with "createCDATASection", some using an XPath object, some traverse the document with getElementByTagName...I am able to echo the CDATA section I want to be able to edit, but can't seem to get removeChild to work (granted I'm trying to absorb all of this at a rapid pace). ...

Can you outline what you're trying to do? You mention lots of things but not what's going on.

tomccabe · November 18, 2010

	
        $doc = new DomDocument();
$file = "spice.xml";
$doc->load($file);
$homesCopy = $doc->getElementsByTagName("homesCopy")->item(0);
$banners = $homesCopy->firstChild;
$doc->removeChild($banners);

"Argument 1 passed to DOMNode::removeChild() must be an instance of DOMNode"

tomccabe · November 18, 2010

Can you outline what you're trying to do? You mention lots of things but not what's going on.

I have an XML file like such:

<spice>
<sections>
	<contact></contact>
	<products></products>
	<people></people>
	<homes></homes>
	<harvest></harvest>
	<contactCopy> 
		<banners><![CDATA[Lots of <u>text</u> with tons of <a href="#">links</a> and <i>other</i> markup.]]></banners>
	</contactCopy>
	<homesCopy>
		<banners><![CDATA[Lots of <u>text</u> with tons of <a href="#">links</a> and <i>other</i> markup.]]></banners>
	</homesCopy>
	<productsCopy>
		<banners><![CDATA[Lots of <u>text</u> with tons of <a href="#">links</a> and <i>other</i> markup.]]></banners>
	</productsCopy>
	<peopleCopy>
		 <banners><![CDATA[Lots of <u>text</u> with tons of <a href="#">links</a> and <i>other</i> markup.]]></banners>
	</peopleCopy>
	<harvestCopy>
		<banners><![CDATA[Lots of <u>text</u> with tons of <a href="#">links</a> and <i>other</i> markup.]]></banners>
	</harvestCopy>	
</sections>
</spice>

What I need to be able to do is pull the CDATA content out, WITH the tags intact, have the client edit the content with a markup editor and put unentitied HTML back into the CDATA section.

Sorry for being all over the place! I've been up all night trying to solve this stupid problem for a stupid Flash site.

Sign In

SimpleXML, CDATA and HTML entities

Recommended Posts

tomccabe

Link to comment

Share on other sites

salathe

Link to comment

Share on other sites

tomccabe

Link to comment

Share on other sites

salathe

Link to comment

Share on other sites

salathe

Link to comment

Share on other sites

tomccabe

Link to comment

Share on other sites

salathe

Link to comment

Share on other sites

tomccabe

Link to comment

Share on other sites

tomccabe

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information