[wx-dev] UTF-8 for internal string processing
Julian Smart
julian at anthemion.co.uk
Tue Apr 1 11:50:49 PDT 2008
This does really concern me too. I haven't yet compiled my apps with
trunk/2.9 but my apps use strings a lot and I am somewhat filled with
dread about having to rewrite major amounts of code in order just to
stay in one place.
Transferring strings between the application and the GUI would, I think,
be a relatively small overhead. Whereas internal string manipulation is
likely to be far more significant.
I hope that it will be possible to compile wxWidgets and one's app in a
compatibility mode that uses the same representation as before. However
this rather negates the goal of one-build-fits-all and would mean
internal wxWidgets code (and any libraries) would still have to be
written using wxT().
I just hope this isn't a shark-jumping moment for wxWidgets :-( Will
most users really want to rewrite their apps to retain current levels of
performance? That's not even an option for everyone.
Regards,
Julian
Jeff Tupper wrote:
> Vaclav and myself have been busy patronizing each other in a marathon
> off-topic chatting session in the bug tracker. I don't want to
> pollute the wx-dev mailing list, but I do think Unicode support is an
> important issue for wxWidgets and think that some more discussion
> would be useful.
>
> I'm just a bit confused about why UTF-8 is replacing other encoding
> systems on some platforms.
>
> Doing so will, at best, improve performance by a small factor. But it
> can degrade performance by orders of magnitude. (Working with a
> megabyte string can become 1,000,000 times slower: months instead of
> seconds.)
>
> To even retain similar performance, one's code must be rewritten to
> use string iterators instead of integer indices. Unfortunately, the
> string iterators don't have quite the same semantics so one can't
> simply replace integers with iterators freely.
>
> Will other encoding systems still be an option on all platforms?
>
> ------------
>
> I wanted to keep this short and to the point, but I'll include my
> comments (as someone familiar with MBCS, Unicode, etc.) regarding the
> "goals" listed at
> http://www.wxwidgets.org/wiki/index.php/Development:_UTF-8_Support.
>
> 1. Make Unicode easier to use, especially avoid wxT() macro which
> confuses people a lot and, as far as possible, don't expose wchar_t
> neither
>
> - We're not confused by wxT() etc. (A side comment: in my
> applications, most strings aren't stored in the source code.)
>
>
>
> 2. Allow using UTF-8 internally in wxString too
> * Save space (especially important for embedded systems, whether
> wxGTK-based (such as Maemo), or wxDFB ones)
>
> - Whether or not UTF-8 saves space depends on usage and what it's
> being compared to. For example, UTF-16 typically takes less space than
> UTF-8 for CJK text. (So CJK users will not only see a speed decrease
> but also a memory increase relative to UTF-16.) There are other
> approaches for saving space.
>
>
>
> * Avoid conversions between wxString and the GUI toolkit (again,
> applies to both wxGTK and wxDFB and maybe even wxMotif)
>
> - Ah, here we go. Unfortunately, if performance is degraded too much,
> wxString will no longer be a general string class. So we may well be
> back to doing conversions. And we'll still be doing string conversions
> when dealing with other APIs that don't use UTF-8.
>
>
>
> 3. Keep compatibility with the existing ANSI build: the goal here is
> 100% backwards compatibility at the source code level
>
> - We don't use the ANSI build.
>
>
>
> 4. Keep compatibility with the existing Unicode build: Unicode and
> ANSI builds are currently orthogonal, so it's going to be impossible
> to stay compatible with both and compatibility with the ANSI build is
> more important because people using the Unicode build hopefully
> understand Unicode better and so will have less problems updating
> their code. Still, compatibility with the existing code should be
> preserved as much as possible.
>
> - So those actually using Unicode are the lowest priority when it
> comes to changing Unicode support details.
> _______________________________________________
> wx-dev mailing list
> wx-dev at lists.wxwidgets.org
> http://lists.wxwidgets.org/mailman/listinfo/wx-dev
>
>
>
--
Julian Smart, Anthemion Software Ltd.
28/5 Gillespie Crescent, Edinburgh, Midlothian, EH10 4HU
www.anthemion.co.uk | +44 (0)131 229 5306
Tools for writers: www.writerscafe.co.uk
wxWidgets RAD: www.anthemion.co.uk/dialogblocks
More information about the wx-dev
mailing list