Jump to content

MS Word Special Character Replacement


fohanlon

Recommended Posts

Hi Guys

 

After searchign the web and forums I still cannot find a suitable solution for my problem.

 

I have a textarea.  Say the user copies in from MS Word the following:

 

“!"£$%^&*()@'#:;/?!"£$%^&*()@'#:;/?!"£$%^&*()@'#:;/?!"£$%^&*()@'#:;/?!"£$%^&*()@'#:;/?!"£$%^&*()@'#:;/?!"£$%^&*()@'#:;/?!"£$!"£$%^&”

 

Before I save it to a database I want to clean it up i.e replace or remove the Â

 

I have read that the  will take up 2 characters.

 

Also, if I could also replace MS Word single and double quotes that would be great.  I was looking at the str_replace with the Chr() function but to no use.

 

Thanks for reading.

 

Regards

 

Fergal.

 

Link to comment
Share on other sites

str_replace works for me, as long as you have the correct encoding.

 

I did this little page to test it:

<?php
mb_internal_encoding("UTF8");
?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
	<meta http-equiv="content-type" content="text/html;charset=utf-8" />
</head>
<body>
	<form action="" method="post">
		<textarea name="inputText"></textarea>
		<input type="submit">
	</form>
	<br />
	<?php
	if(isset($_POST['inputText'])){
		$text = $_POST['inputText'];
		echo stripslashes(str_replace("Â","A",str_replace("“","",str_replace("”","",$text))));
	}
	?>
</body>
</html>

 

it outputs the same, with Â's converted to A and word's curly quotes removed.

Note: I used stripslashes just to see the output because the single quotes were escaped, but you definitely want to use something like mysql_real_eacape_string before inserting something like that into a database.

 

Hope this helps

Link to comment
Share on other sites

The  is not some magic letter. If they tried copying more exotic characters you'd find something else instead of that Â.

 

Make sure your HTML page is in UTF-8 encoding. Make sure your database tables are in UTF-8 encoding.

With those two set you shouldn't have to alter any data. Worst and unlikely case, the pasted text will be double-encoded, in which case you need to decode it once to get the original data.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.