[wx-dev] Encoding ISO-8859-11 on Windows

Robert Roebling robert at roebling.de
Sat Sep 2 05:19:46 PDT 2006


> JS> When using an encoding of ISO-8859-11 on Windows XP, wxWidgets tries to 
> JS> map it to code page 28601 in src/msw/utils.cpp.
> JS> But this code fails to find a valid code page:
> JS> 
> JS>     if (::IsValidCodePage(ret) == 0)
> JS>         return -1;
> JS> 
> JS> This is in src/msw/utils.cpp, function wxEncodingToCodepage.
> JS> 
> JS> On the other hand this page:
> JS> 
> JS> http://en.wikipedia.org/wiki/ISO_8859-11
> JS> 
> JS> thinks that the code page should be 874. Any reason not to change the 
> JS> mapping to 874, given that CP 28601 appears not to exist?
> 
>  No, CP874 seems to be the right one for ISO-8859-11. And in fact other CP
> are incorrect as well if you trust this list:
> 
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/wceinternational5/html/wce50conCodePages.asp
> 
>  Could you please update them if you change this one?

Slowly, the idea of the code is that if certain ISO codepages
are installed at any time (language packs, newer version of IE),
that code can test for the presence of the exact ISO encoding,
not only something similar in the OEM world. The bad example
in case is that previous code replaced ISO8859-15 with Latin1,
killing all Euro signs. If the specific Windows version didn't
have ISO8859-15 installed, the code in wxConv would fall back
to wxWidgets built-in tables (and this works correctly now).
Here is another list:

http://msdn.microsoft.com/workshop/author/dhtml/reference/charsets/charset4.asp

If I understand it correctly, Windows principally understands these

'28591              iso-8859-1                   Western European (ISO)
'28592              iso-8859-2                   Central European (ISO)
'28593              iso-8859-3                   Latin 3 (ISO)
'28594              iso-8859-4                   Baltic (ISO)
'28595              iso-8859-5                   Cyrillic (ISO)
'28596              iso-8859-6                   Arabic (ISO)
'28597              iso-8859-7                   Greek (ISO)
'28598              iso-8859-8                   Hebrew (ISO-Visual)
'28599              iso-8859-9                   Turkish (ISO)
'28603              iso-8859-13                  Estonian (ISO)
'28605              iso-8859-15                  Latin 9 (ISO)

So we'd need to remove the test cases for 8859-11 with 874,
8859-12 is still under development for Gaelic (hence
still commented out) and 8859-14 does not seems to have
any correspondance under Windows, so that should be
commented out, too.

  Robert

  Robert





More information about the wx-dev mailing list