I am not really a giraffe. Posted July 26, 2011 Share Posted July 26, 2011 On the Story page (http://www.dailydiapers.com/content/stories.html), I always have to change the character encoding settings in Firefox to Western (ISO-8859-1) from Unicode. This is the only website on the entire interwebz that I have to do that to. My Auto-Detect is set to Universal. What gives? I have lived with this since the beginning of time, so it's no huge deal. I just wonder if anyone else is experiencing this. Link to comment
I am not really a giraffe. Posted October 21, 2012 Author Share Posted October 21, 2012 So, a year and change later and no answer? I know I said no biggie, but I'm just dying to know what I'm doing wrong. Link to comment
Zander Posted October 23, 2012 Share Posted October 23, 2012 Most of the content on story pages isn't actually Unicode (specifically I presume you're talking about utf-8 of which iso-8859-1 is a subset). In a lot of cases I've seen they're Windows-125* encoded, but the server is kicking out a "Content-Type: text/html; charset=UTF-8" HTTP header with the response. Regardless, if you're choosing iso-8859-1 in place of utf-8 you're not doing it right: iso-8859-1 is a complete subset of utf-8, so if its the page is not decoding properly with utf-8, it cannot possibly be valid iso-8859-1. I don't know why its not providing a Content-Encoding header but it shouldn't be necessary given the above. Note that the pages aren't valid (X)/HTML so browsers might be going into quirks mode and parsing it as dodgy HTML 4 using only a standard latin ascii (iso-8859-1) encoding where there are characters in other encodings in the page. At least this will stop trying to decode non-8859-1 characters to something else that actually exists in utf-8. Note that the pages do include a "" directive, but I'm not sure if browsers will honour this. For a start it might be clobbered by the HTTP header field of the same name, and again if in quirks mode I expect browsers might simply say 'sod it' and ignore such page-specific pragmas/directives. Suggestions for a local fix then: use whatever is in the meta http-equiv="Content-Type" as the charset of the page (i.e. Windows-1252). If that's not working, parse it as utf-8 and drop any invalid or extra-wide characters. 2) Link to comment
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now