Jump to content

xml_parse eats < and > from cdata


jørgenj

Recommended Posts

I am trying to debug a problem with SimplePie (RSS/ATOM feed parser) as used in Joomla! 1.5.14 (latest). Having identical (out of the box) installations on different hosting providers I notice a strange problem with the XML parser (as used in SimplePie). I have absolutely no experience with the XML parser used in PHP by the way.

 

On some installations xml_parse removes '<' and '>' found in cdata (which is not good when sending the news feed description to the browser). On other installations '<' and '>' are translated to '<' and '>' as expected.

 

As far as I can see the installations are identical except for the libXML and PHP version numbers (libXML version 2.6.27 (PHP Version 5.2.10) on installations working OK and libXML version 2.7.3 (PHP Version 5.2.8 ) on installitions having problems).

 

xml_parser_create_ns is used to create the parser (encoding=UTF-8, separator= ' '). OPTION_SKIP_WHITE=1, XML_OPTION_CASE_FOLDING=0.

 

Here is a detailed example. The input to xml_parse is always the same (extract):

 

<description><p><a href="http://www.packtpub.com/nominate-best-open-source-php-cms">  ....  

 

On systems that is working OK, the "character data handler" function (as configured by xml_set_character_data_handler) receives the following cdata

fragments (in its second parameter "string $data"):

 

(SimplePie_Parser::tag_open tag: description - attributes: a:0:{})
SimplePie_Parser::cdata: '<'
SimplePie_Parser::cdata: 'p'
SimplePie_Parser::cdata: '>'
SimplePie_Parser::cdata: '<'
SimplePie_Parser::cdata: 'a href="http://www.packtpub.com/nominate-best-open-source-php-cms"'
SimplePie_Parser::cdata: '>'

 

This yields valid HTML: <p><a href="http://www.packtpub.com/nominate-best-open-source-php-cms">

 

On installations having problems it looks like this:

 

(SimplePie_Parser::tag_open tag: description - attributes: a:0:{})
SimplePie_Parser::cdata: 'p'
SimplePie_Parser::cdata: 'a href="http://www.packtpub.com/nominate-best-open-source-php-cms"'

 

As can be seen, fewer calls and the '<' and '>' are just gone!

 

Everything else (Joomla! etc.) works OK by the way...

 

Any idea why this happens?

Link to comment
Share on other sites

After some further investigations, this turns out to be a PHP / libxml bug. It affects some installations only:

 

libxml 2.7.x on PHP < 5.2.9 and

libxml 2.7.0 to 2.7.2 on any PHP version

 

http://bugs.php.net/bug.php?id=45996

http://bugs.gentoo.org/show_bug.cgi?id=249703

http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-entities-downgrading-libxml

http://blog.code-head.com/fixing-libxml-php-bug-and-issues-with-html-entities-libexpat

https://glowhost.com/forums/general-support/php5-libxml2-xml-parse-bug-1574.html

https://bugzilla.redhat.com/show_bug.cgi?id=467314

 

Newer versions of SimplePie (version 1.2) has code to get around this bug. Unfortunately however, Joomla! is still using

 

the old SimplePie version 1.0.1.

 

Here is a simple test that can be used to check for this problem (save the following code to a file called "xmltest.php",

 

upload it to the server holding your Joomla! installation and point your browser at it):

 

<?php

$parser_check = xml_parser_create();
xml_parse_into_struct($parser_check, '<foo>&</foo>', $values);
xml_parser_free($parser_check);
$xml_is_sane = isset($values[0]['value']);

if (!$xml_is_sane)
{
echo "XML is broken!";
} else {
echo "XML is OK!";
}

?>

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.