[wxPython-users] unicode handling

Josiah Carlson jcarlson at uci.edu
Wed Aug 2 21:55:19 PDT 2006


The trick is UTF-16 Big-endian or Little-endian.  Because any "utf-16"
encoded file can have either ordering, a file encoded with the general
"utf-16" method must have a Byte Order Mark (BOM) to be able to
distinguish between the two.

If you know for certain which it was, you can open the file as
'utf-16-be' or 'utf-16-le'.  The standard Python codecs do not
automatically prefix the output with a BOM, so you would need to prefix
it manually on output.  If you have control over writing data, I would
suggest writing to utf-8, which doesn't have ordering concerns, tends to
be roughly 1/2 the size on disk as utf-16 (for texts with primarily
latin alphabets), etc.

 - Josiah


"Thomas Thomas" <thomas at mindz-i.co.nz> wrote:
> Hi all,
> 
> I have a file with special characters such as "£" etc.  I need to
> read the file into a list as unicode strings..
> How can I do this.. I tried codecs 
> 
> import codecs
> filename='d:/poll/test.XST'
> metaHash={}
> infile = codecs.open(filename, "r", encoding='utf-16')
> text = infile.read().split('\n')
> print text
> 
> I am getting the error
> 
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>   File "c:/DOCUME~1/ADMINI~1/LOCALS~1/Temp/python-1928Lij.py", line 9, in ?
>     text = infile.read().split('\n')
>   File "C:\Python23\lib\codecs.py", line 380, in read
>     return self.reader.read(size)
>   File "C:\Python23\lib\encodings\utf_16.py", line 48, in read
>     raise UnicodeError,"UTF-16 stream does not start with BOM"
> UnicodeError: UTF-16 stream does not start with BOM
> 
> also a sample file content will be
> string MetaDataPrompt = "Discovery No";
> 
> string MetaDataFieldName = "Discovery No";
> 
> string MetaDataType = "string";
> 
> string MetaDataValue = "£500";
> 
> }
> 
> 3{
> 
> string MetaDataPrompt = "comments";
> 
> string MetaDataFieldName = "Comments";
> 
> string MetaDataType = "string";
> 
> string MetaDataValue = "Energy Scope £500";
> 
> 
> 
> I know I should have asked this on python-list and not on wxpython ..
> But when "£" is entered through the gui everything is working fine. But
> when I try reading it from a file I am having problems. So I thought I
> will try in here as well





More information about the wxpython-users mailing list