[wx-dev] Re: decomposed UTF8 for fn_str
Stefan Csomor
csomor at advancedconcepts.ch
Sun Dec 2 13:03:06 PST 2007
Hi Vaclav
On 02.12.2007, at 21:09, Vaclav Slavik wrote:
>> so opening files name äöü.txt cannot work anymore.
>
> Actually, this should work all right as far as I can tell, because any
> forms of UTF-8 are accepted by OS X file functions, it's just that
> when OS X API call returns a filename, it's decomposed...
then I must have hit some strange corner in earlier times, because I
had to introduce decomposition in order for things to work, so I
assumed it had to be done for all calls. But according to the
documentation you are right. I've tested with a manual utf8 string and
fopen and it worked correctly in non-decomposed, so I'll have to find
out why it doesn't work with wxFFile.Open
>
>> Now the question is :
>>
>> - should I change just the method fn_str to do just this ?
>
> Another alternative that looks much better to me is to not do any
> conversion in fn_str(), because OS X functions _do_ accept any form
> of Unicode strings and only convert back from decomposed form to NFC
> when obtaining strings from OS X API calls -- i.e. not when fn_str()
> is called, but when you're creating wxString from char* value
> obtained from some OS X files-related call.
but it still needs to be in UTF8, so some call will have to be there
>
>
> The rationale is that NFC is the de-facto standard (looks like
> everybody except Apple in FS calls uses it) and so we should keep the
> data in this form as much as possible.
>
>> - should we have a 'normalization' attribute somewhere when creating
>> a converter
>> - should we have a special encoding constant for UTF8-Decomposed ?
>
> We should have neither, because normalization has nothing to do with
> UTF-8, it's equally valid for any other Unicode representation,
> including wxString's internal one on all platforms. We probably
> should have some normalization code operating on wxString (i.e. on
> decoded Unicode values) -- if nothing else, then because they're
> needed to implement fn_str() on OS X.
yes, I should have been clearer on that, as a test I've added a
decompose member to the CF based string conversions, the point was
that it was supposedly needed for utf8 only, so instead of boosting
the number of encodings, I thought only to add the one that was needed
> But I'm not sure what exactly to do outside of fn_str()... Assuming we
> stick to using NFC in wxString (and we really, really should: it's
> what everybody does and it's more "natural" in that length() or
> index-based access does the expected thing [more often], while in
> NFD, it's not even the case with latin1). We could:
>
> (a) Assume wxStrings are in NFC (because it's what everybody except
> Apple uses) and don't do anything special, except in some OS X
> specific code for creating strings from char*.
>
> (b) Assume NFC, but also convert all wxStrings into normalized form on
> creation. This is hopefully trivial for iso-8859-*, I don't about
> various Asian MB encodings and it would certainly require
> post-processing when creating Unicode string from any of the UTF-*
> encodings. It would cost us additional processing for every wxString
> conversion, but OTOH, it would guarantee the NFC assumption holds and
> it would automagically solve the problem for OS X because as soon as
> a wxString would be created from char*, it would also be normalized.
I'd support a), I'm recomposing all pathnames that come from file
dialogs so that we are sure that we have NFC here. So this would
probably leave the task of having a 'normalization' flag / function in
the converter
Thanks,
Stefan
More information about the wx-dev
mailing list