[wxPython-users] wxPython and Unicode and widgets
Christopher Barker
Chris.Barker at noaa.gov
Fri Dec 21 09:52:21 PST 2007
Here's how I think about it:
No matter how you slice it, and whether you are dealing with ANSI or
Unicode, you NEED to know the encoding your string of bytes is in, and
you need to deal with that appropriately. Period, end of story.
Unicode can appear to cause additional problems because Unicode
encodings can holds many, many more code points, and thus when you try
to translate from Unicode to an ANSI encoding, it is quite possible to
have a code point that can not be translated. However, this problem is
eliminated if you use unicode everywhere, AND you know what encoding
you're dealing with.
When I find Python a bit frustrating is that I don't think you can tell
it, on the application level, to tell it to use the "replace" or
"ignore" flag by default with encode(). I'd often rather get garbage
than an exception in my apps!
Anyway, if you use Unicode everywhere in your app, then you "only" have
to deal with these issues on I/O -- you need to know the encoding of
data your reading in, and you need to provide the needed encoding for
data you're writing out. You need to do that right for ANSI too, so you
haven't lost anything.
Which brings up another point touched on in this thread -- "only I/O".
It's very handy to use "print" with python, but that's I/O, and most
terminals don't seem to support unicode, or python doesn't know what
encoding to send to the terminal, so, again -- more pain.
So why bother with Unicode at all, if you still have "which encoding"
confusion? Two reasons:
-- unicode can hold more code points (theoretically all of them for
all languages), so you can have multiple languages supported within one
document, which can be a nice feature.
-- It's the way the computing world is going, so you're going to have to
deal with it anyway. If you start getting unicode data as input, you're
going to be a whole lot better off if you're using Unicode internally,
otherwise you WILL lose data (at best) or crash your app (at worst) if
you get data that can't be represented in ANSI.
We're dealing with this right now in a Web app that get data from a lot
of sources, and it's using libs that aren't fully unicode. It's all too
easy for it to accept input from the browser, save it in the database,
then crash when you try to edit it again -- aarrgg!!
Chris Mellon wrote:
> Once you understand the fundamental difference between unicode and raw
> bytes, it's really not that difficult to understand, and
> encoding/decoding isn't that complicated.
Here's a good start to understanding Unicode:
http://www.joelonsoftware.com/articles/Unicode.html
And here's one that's Python-specific:
http://boodebr.org/main/python/all-about-python-and-unicode
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker at noaa.gov
More information about the wxpython-users
mailing list