Jump to content

preg_replace strangeness


MK27

Recommended Posts

I have a line of text that ends like this:

 

supports TEI-Lite, TEI XML, and TEI SGML documents.\",

 

I need to remove the backslash before the quote, so I assumed a preg_replace like this would work:

 

preg_replace("/\\\",$/","\",",$lines[$i-1]);

 

One \ and one " to be replaced with ".  It does not do the replacement correctly, even though it does match, which I realized by testing with:

 

preg_replace("/\\\",$/","XX\",",$lines[$i-1]);

 

Which changes the line to:

 

supports TEI-Lite, TEI XML, and TEI SGML documents.\XX",

 

Hmmm.  So what does work is this:

 

preg_replace("/\\\\\",$/","\",",$lines[$i-1]);

 

To me, this looks like two \ (not one) then ".  The line does not contain that, but this is the preg_replace which gets the job done.

 

What have I misunderstood about php's regular expression handling?

Link to comment
Share on other sites

Why don't you just use a plain str_replace?

 

I don't want to replace all occurances of \" with ", just the last one.

 

I'd still like to know why preg_replace requires that.  It seems like a bonified bug to me, which I am very surprised to find in a general function in a very widely used and seasoned language.  Can anyone explain this (or is it really a bug)?

Link to comment
Share on other sites

It's not a bug, you firstly need to account for what PHP thinks is an backslash escape sequence, you then need to account for what PCRE considers an escape sequence. You are using a double quote inside a double quoted string, thus meaning it needs to be escaped, there's one backslash. You then wish to match a backslash in your input string. In order to do this let's say we place a single quote in the string. PHP will see this as escaping the backslash which is supposed to be escaping the double quote, thus you need to escape it to prevent that happening. At this point we have 3 backslashes in our patterns. Out of these 3 only one will survive the PHP interpolation. Meaning the Regex pattern contains a single slash. The PCRE engine will assume this backslash is an escape sequence. In order to counter that we need to make sure 2 make it through the the PCRE engine, the only way to do this is add another 2 into the string. That's 5 backslashes.

 

As kratsg has pointed out, this can be alleviated somewhat by using a single quote string, since the double quote then doesn't need escaping. I think you will still need 4 though not the 3 they suggested.

Link to comment
Share on other sites

It's not a bug, you firstly need to account for what PHP thinks is an backslash escape sequence, you then need to account for what PCRE considers an escape sequence. You are using a double quote inside a double quoted string, thus meaning it needs to be escaped, there's one backslash. You then wish to match a backslash in your input string. In order to do this let's say we place a single quote in the string. PHP will see this as escaping the backslash which is supposed to be escaping the double quote, thus you need to escape it to prevent that happening. At this point we have 3 backslashes in our patterns. Out of these 3 only one will survive the PHP interpolation. Meaning the Regex pattern contains a single slash. The PCRE engine will assume this backslash is an escape sequence. In order to counter that we need to make sure 2 make it through the the PCRE engine, the only way to do this is add another 2 into the string. That's 5 backslashes.

 

As kratsg has pointed out, this can be alleviated somewhat by using a single quote string, since the double quote then doesn't need escaping. I think you will still need 4 though not the 3 they suggested.

 

I'm think I follow you on this but (sorry to nitpick), it is still a bug/oversight:

<?php
$s = 'hello \ world';
$s = preg_replace('/\\/','_', $s);
print $s;
?> 

 

Here I get "PHP Warning:  preg_replace(): No ending delimiter '/' found in /media/sda6/root/php/test.php on line 3"

 

I did find this in a tutorial:

If you are looking for a backslash, you need to escape that also. But, we also need to escape the control character too, which is itself a backslash, hence we need to escape twice like this

\\\\

 

In fact three \\\ does work.

 

I'm coming from perl, where that is not the case altho the delimiters, PCRE, etc, are the same -- this story about how logically you must "escape the control character" is a bit bogus.  It's due to an oversight in the design (again: not to gripe!  just honesty).  I guess knowing about the issue suffices as a "fix".

 

Link to comment
Share on other sites

Just because it doesn't work how you want it to, does not mean there was an oversight or that there is a bug. The issue IS to do with escaping characters successfully so that at the point the PCRE engine receives them it is a valid PCRE pattern. It has nothing to do with preg_replace, it's just the way PHP handles strings. Obviously I was slightly incorrect in my previous post, but hey it get's confusing and it's been a long day. If you don't believe me try echo'ing out your pattern before you pass it to preg_replace.

 

$pattern = '/\/';
print $pattern . '<br/>';
$pattern = '/\\/';
print $pattern . '<br/>';
// etc...

Link to comment
Share on other sites

Just because it doesn't work how you want it to, does not mean there was an oversight or that there is a bug. [...] It has nothing to do with preg_replace, it's just the way PHP handles strings.

 

I don't want to sound like some perl jerk who's here to slam php, but the fact that this is a result of "the way PHP handles strings" still makes it seem like an oversight or consequence of the design.  ::)  I can also see why there is no need to "fix" that, tho.

 

try echo'ing out your pattern before you pass it to preg_replace.

 

That is enlightening, thanks.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.