[wx-dev] UTF-8 for internal string processing

Jeff Tupper tupperlists at gmail.com
Sun Apr 6 13:18:10 PDT 2008


On Sun, Apr 6, 2008 at 11:41 AM, Robert Roebling <robert at roebling.de> wrote:
>  The current wxString class is an opaque class where the user doesn't
>  know what encoding the class is using and the class (or rather
>  wxWidgets) will choose what is "the best" encoding for the majority
>  of cases.

I'm not sure if this sort of reasoning (average / majority case) is
appropriate. At the least I think it needs some more precision.
Consider:

1. Even if UTF-8 makes most wxString operations in an application a
little faster the times it slows down string processing by a factor of
one thousand or more is enough to question its use for all wxString
processing. (While end-users will appreciate a program that's a little
snappier, they may well not put up with one that occasionally seems to
hang.)

2. Even if the majority of applications get a little snappier, what
about the applications that become completely unusable? Average-case
arguments are not as compelling for the design of a library as for the
design of an application.

Both considerations can be unified by simply asking over what range is
the complexity being averaged over. (The majority of what, exactly?)
Average-case arguments are most compelling if this range is narrow.
For example, if changes in wxString caused improved performance in any
application when averaged over a millisecond, I would think that's
pretty good. I'm not such a fan when one group of users benefits a
small amount while another is heavily penalized (even if one group is
larger than the other).

And, just to reiterate: I'm not asking that UTF-8 be removed as an
option. I just don't see why it should be the only option on some
platforms.



>  the user should have the option to dictate what encoding
>  wxString uses and this should be possible at run-time (to avoid
>  having different builds again).

This would certainly bring up a bunch of new options, such as
automatically using UTF-16 or UTF-32 for larger strings that are being
worked on by an application. One issue is that the Windows UTF-16
build has different semantics than the other builds (surrogate pair
handling is not done by wx, but by the application).


More information about the wx-dev mailing list