[wxPython-users] wxPython and Unicode and widgets

Christopher Barker Chris.Barker at noaa.gov
Fri Dec 21 09:52:21 PST 2007


Here's how I think about it:

No matter how you slice it, and whether you are dealing with ANSI or 
Unicode, you NEED to know the encoding your string of bytes is in, and 
you need to deal with that appropriately. Period, end of story.

Unicode can appear to cause additional problems because Unicode 
encodings can holds many, many more code points, and thus when you try 
to translate from Unicode to an ANSI encoding, it is quite possible to 
have a code point that can not be translated. However, this problem is 
eliminated if you use unicode everywhere, AND you know what encoding 
you're dealing with.

When I find Python a bit frustrating is that I don't think you can tell 
it, on the application level, to tell it to use the "replace" or 
"ignore" flag by default with encode(). I'd often rather get garbage 
than an exception in my apps!

Anyway, if you use Unicode everywhere in your app, then you "only" have 
to deal with these issues on I/O -- you need to know the encoding of 
data your reading in, and you need to provide the needed encoding for 
data you're writing out. You need to do that right for ANSI too, so you 
haven't lost anything.

Which brings up another point touched on in this thread -- "only I/O". 
It's very handy to use "print" with python, but that's I/O, and most 
terminals don't seem to support unicode, or python doesn't know what 
encoding to send to the terminal, so, again -- more pain.

So why bother with Unicode at all, if you still have "which encoding" 
confusion? Two reasons:

  -- unicode can hold more code points (theoretically all of them for 
all languages), so you can have multiple languages supported within one 
document, which can be a nice feature.

-- It's the way the computing world is going, so you're going to have to 
deal with it anyway. If you start getting unicode data as input, you're 
going to be a whole lot better off if you're using Unicode internally, 
otherwise you WILL lose data (at best) or crash your app (at worst) if 
you get data that can't be represented in ANSI.

We're dealing with this right now in a Web app that get data from a lot 
of sources, and it's using libs that aren't fully unicode. It's all too 
easy for it to accept input from the browser, save it in the database, 
then crash when you try to edit it again -- aarrgg!!

Chris Mellon wrote:
 > Once you understand the fundamental difference between unicode and raw
 > bytes, it's really not that difficult to understand, and
 > encoding/decoding isn't that complicated.

Here's a good start to understanding Unicode:
http://www.joelonsoftware.com/articles/Unicode.html

And here's one that's Python-specific:
http://boodebr.org/main/python/all-about-python-and-unicode

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker at noaa.gov




More information about the wxpython-users mailing list