Jump to content

Converting Smart Quotes to Regular Quotes


kittrellbj

Recommended Posts

I am having more trouble with this than is probably reasonable.  Here's what I'm trying to do:

 

1. Copy text from a word processor (Microsoft Office or OpenOffice) into a .txt file (Notepad, etc.).

2. Convert the .txt file into a .html file.

 

The problem I'm running into is that smart quotes (curly quotes), the long hyphen, and apostrophes are turning into ? in the final document.  I've gone around Google trying to locate a solution that will work converting these troublesome characters into regular old double quotes ("), but they don't work.

 

I'm working on a Windows XP machine, using XAMPP as my work environment.  Most people submitting the .txt files will be coming from a Windows computer.  (I know that Microsoft has done wonders in messing up the encoding system in regards to smart quotes...)

 

I've tried:

function convert_smart_quotes($string) {

$quotes = array(
    "\xC2\xAB"     => '"', // « (U+00AB) in UTF-8
    "\xC2\xBB"     => '"', // » (U+00BB) in UTF-8
    "\xE2\x80\x98" => "'", // ‘ (U+2018) in UTF-8
    "\xE2\x80\x99" => "'", // ’ (U+2019) in UTF-8
    "\xE2\x80\x9A" => "'", // ‚ (U+201A) in UTF-8
    "\xE2\x80\x9B" => "'", // ‛ (U+201B) in UTF-8
    "\xE2\x80\x9C" => '"', // “ (U+201C) in UTF-8
    "\xE2\x80\x9D" => '"', // ” (U+201D) in UTF-8
    "\xE2\x80\x9E" => '"', // „ (U+201E) in UTF-8
    "\xE2\x80\x9F" => '"', // ‟ (U+201F) in UTF-8
    "\xE2\x80\xB9" => "'", // ‹ (U+2039) in UTF-8
    "\xE2\x80\xBA" => "'", // › (U+203A) in UTF-8
);
$str = strtr($string, $quotes);
return $string;
}

 

and also

<?php 

function convert_smart_quotes($string) 
{ 
    $search = array(chr(145), 
                    chr(146), 
                    chr(147), 
                    chr(148), 
                    chr(151)); 

    $replace = array("'", 
                     "'", 
                     '"', 
                     '"', 
                     '-'); 

    return str_replace($search, $replace, $string); 
} 

?>

 

and also trying to display the HTML characters for them instead...

 

<?php 

$replace = array('‘', 
                 '’', 
                 '“', 
                 '”', 
                 '—'); 

?>

 

Nothing seems to work.  I know it has something to do with the encoding, but I can't seem to figure out a way to replace these little buggers and keep from having a million ? symbols throughout the file. :(

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.