[wx-dev] UTF-8 for internal string processing
Vadim Zeitlin
vadim at wxwidgets.org
Tue Apr 1 14:39:25 PDT 2008
On Tue, 1 Apr 2008 11:39:30 -0700 Jeff Tupper <tupperlists at gmail.com> wrote:
JT> I'm just a bit confused about why UTF-8 is replacing other encoding
JT> systems on some platforms.
To avoid unnecessary conversions between the strings used by wx and the
strings used by underlying toolkit (i.e. GTK+ or DirectFB).
JT> To even retain similar performance, one's code must be rewritten to
JT> use string iterators instead of integer indices.
Did you measure performance difference between using iterators and
indices? If yes, could we please see the results of your benchmarks?
Otherwise I don't really understand the words of "even" and "similar".
JT> Unfortunately, the string iterators don't have quite the same semantics
JT> so one can't simply replace integers with iterators freely.
Do you have any reasonable examples when replacing indices with iterators
is a problem?
JT> Will other encoding systems still be an option on all platforms?
It should work by using the appropriate configure option but I don't know
if anybody is testing it regularly so it might be broken.
JT> 1. Make Unicode easier to use, especially avoid wxT() macro which
JT> confuses people a lot and, as far as possible, don't expose wchar_t
JT> neither
JT>
JT> - We're not confused by wxT() etc.
You, personally, might not be. But many, many people (and I do speak from
experience here) are. Anyhow, the use of simplified, Unicode-neutral API
has nothing to do with using UTF-8 internally in wxString as the same API
is used for wchar_t-based wxString implementation too.
JT> 2. Allow using UTF-8 internally in wxString too
JT> * Save space (especially important for embedded systems, whether
JT> wxGTK-based (such as Maemo), or wxDFB ones)
JT>
JT> - Whether or not UTF-8 saves space depends on usage and what it's
JT> being compared to. For example, UTF-16 typically takes less space than
JT> UTF-8 for CJK text. (So CJK users will not only see a speed decrease
JT> but also a memory increase relative to UTF-16.)
It so happens that most embedded systems projects using wx use latin1
right now so using UTF-8 does save space. Besides, as long as underlying
toolkit uses UTF-8, you save memory and code used for (constant)
translation between the two.
JT> There are other approaches for saving space.
This looks interesting, could you please tell us more about this?
JT> * Avoid conversions between wxString and the GUI toolkit (again,
JT> applies to both wxGTK and wxDFB and maybe even wxMotif)
JT>
JT> - Ah, here we go. Unfortunately, if performance is degraded too much,
I have no reason to believe it is. Do you?
JT> 3. Keep compatibility with the existing ANSI build: the goal here is
JT> 100% backwards compatibility at the source code level
JT>
JT> - We don't use the ANSI build.
Good for you!
Err, besides that, was there any point in saying this? Or do you mean that
the fact that you don't use it somehow implies that nobody does? I find
your message surprisingly egocentric, not that there is anything wrong with
it, of course, but hopefully you do realize that we don't design wxWidgets
taking the needs of a single user only into the account.
JT> 4. Keep compatibility with the existing Unicode build: Unicode and
JT> ANSI builds are currently orthogonal, so it's going to be impossible
JT> to stay compatible with both and compatibility with the ANSI build is
JT> more important because people using the Unicode build hopefully
JT> understand Unicode better and so will have less problems updating
JT> their code. Still, compatibility with the existing code should be
JT> preserved as much as possible.
JT>
JT> - So those actually using Unicode are the lowest priority when it
JT> comes to changing Unicode support details.
Yes, when the choice is between breaking the code of someone who doesn't
know about Unicode and someone who does _and_there_is_no_way_to_avoid_it_
we prefer to do the latter exactly because the fix will hopefully be more
obvious to Unicode-savvy person.
Regards,
VZ
More information about the wx-dev
mailing list