Jump to content

Special characters


rgarrot

Recommended Posts

Hi People!

 

I'm from Brazil and I'm having some troubles with special characters when i'm getting an object from a REST web service.

 

after I send the GET method, I receive the return, and at the message body...

 

I should get this (in portuguese):

"União da Ilha"

"Braços abertos"

 

But i'm getting:

"Un]e0o da ilha"

"Bra{f3os abertos"

 

It is the special latim characters.

I've tried almost all codification methods, like utf8_encode, htmlentities, etc.

 

someone could help me ?

thks a lot!! (and sry my poor english).

Link to comment
Share on other sites

Normally mojibake shows as unusual characters, not things you'd find on a keyboard.

 

Can you post what you get when you run

echo bin2hex($string);

(if $string is the response)? Without any modifications to it.

 

Also try using mb_detect_encoding on that response. Sometimes it can tell you what encoding the string is, other times it doesn't know.

Link to comment
Share on other sites

with mb_detect_encoding, sometimes i'm getting ASCII and sometimes UTF-8.

 

ASCII -  Uni;e0o da Ilha

ASCII -  Unidos da Tijuca

ASCII -  Portela

ASCII -  Unidos de Vila Isabel

ASCII -  Uni;e0o do Parque Curicica

UTF-8 - Uni;e0o de Jacarepagu80

ASCII - Infantes do Lins

 

and with bin2hex() :

 

556e693b65306f20646120496c6861 - G.R.E.S Uni;e0o da Ilha do Governador

556e69646f732064612054696a756361 - G.R.E.S Unidos da Tijuca

506f7274656c61 - G.R.E.S Portela

556e69646f732064652056696c612049736162656c - G.R.E.S Unidos de Vila Isabel

556e693b65306f20646f20506172717565204375726963696361 - G.R.E.S Uni;e0o do Parque Curicica

556e693b65306f206465204a616361726570616775103263 - G.R.E.S Uni;e0o de Jacarepagu80

472e522e432e452e532e4d2e20496e66616e74657320646f204c696e73 - Infantes do Lins

 

any idea?

Link to comment
Share on other sites

I can't tell what it is. Accented characters are represented by some kind of a control character (like a semicolon or 0x10) followed by two hex digits but I don't see a correlation between those and the original character.

0x3B 0x65 0x30 = ã
0x10 0x32 0x63 = á

I suggest you contact the people who own the web service and ask them about this.

 

Short of that you can manually substitute those sequences with (for example) strtr() like

$string = strtr($string, array(
    "\x3B\x65\x30" => "ã",
    "\x10\x32\x63" => "á"
));

If you do this, be sure to save the file with this code in whatever encoding you want the characters to be. So save the file as UTF-8 if you want that, or ISO 8859-1 if you want that.

Link to comment
Share on other sites

OMG!

sometimes the same character is with different code...

 

I replaced:

Acadêmicos da Abolição - 41636164c382c2ab36396d69636f732064612041626f6c697865663b63306f -

 

but:

Cora}65?es Unidos do Amarelinho - 436f72617d36353f657320556e69646f7320646f20416d6172656c696e686f

 

"Corações Unidos do Amarelinho"

 

The "ç" is with a different codification...

 

 

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.