Skip to content Skip to sidebar Skip to footer

Decoding Numeric Html Entities Via PHP

I have this code to decode numeric html entities to the UTF8 equivalent character. I'm trying to convert this character: ’ which should output: ’ However, it just d

Solution 1:

html_entity_decode already does what you're looking for:

$string = '’';

echo html_entity_decode($string, ENT_COMPAT, 'UTF-8');

It will return the character:

’   binary hex: c292

Which is PRIVATE USE TWO (U+0092). As it's private use, your PHP configuration/version/compile might not return it at all.

Also there are some more quirks:

But in HTML (other than XHTML, which uses XML rules), it's a long-standing browser quirk that character references in the range € to Ÿ are misinterpreted to mean the characters associated with bytes 128 to 159 in the Windows Western code page (cp1252) instead of the Unicode characters with those code points. The HTML5 standard finally documents this behaviour.

See: ’ is getting converted as “\u0092” by nokogiri in ruby on rails


Post a Comment for "Decoding Numeric Html Entities Via PHP"