[wx-dev] Improving wxString proposal (Re: #9672: wx 2.9
Ansi/Unicode Combi-wxString)
Vadim Zeitlin
vadim at wxwidgets.org
Tue Jul 8 06:19:23 PDT 2008
On Tue, 08 Jul 2008 15:12:40 +0200 Robert Roebling wrote:
RR> These two sentences would make believe that UTF-8 support
RR> has anything to do with the removal of the ANSI build, but
RR> it obviously has not. The ANSI build could have been removed
RR> by adding the various char* overloads, automatic conversions
RR> and c_str() magic totally without any UTF-8 changes.
Yes, this is correct. ANSI/Unicode neutral API and UTF-8 internal
representation are orthogonal except that the former is practically
required by the latter so it made sense to implement both at the same time.
RR> It would have been correct to write:
RR>
RR> a) We want to abandon the ANSI build mode by providing various
RR> char * overloads, automatic conversions and c_str() magic
RR> and thus make wxWidgets always Unicode aware with a maximum
RR> of (but ot total) backwards compatibility.
RR>
RR> b) We separately want to use UTF-16 as the internal storage format
RR> for wxString under Windows (as before) and UTF-8 under Linux
RR> and OS X (in contrast to UCS4) with an identical interface in
RR> either case.
RR>
RR> c) Using UTF-8 under Linux may sometimes have a beneficial effect
RR> on performance since conversion to and from UTF-8 before calling
RR> GUI functions (GTK+) as well as CRT functions (glibc) will no
RR> longer be required and memory requirements per string will also
RR> often be reduced compared to storing strings in UCS4.
RR>
RR> d) For people requiring O(1) access to Unicode strings under Linux
RR> and OS X, the library can still be compiled in wchar_t/UCS4 mode.
RR> Alternatively, strings can be converted to std::wstring and then
RR> processed further (and later hopefully also to char16_t based
RR> strings).
This is a good summary.
RR> The main point of critique is that d) wasn't required before
RR> and little data suggest that c) is relevant. Indeed, it most
RR> likely, it is not.
I admit that I underestimated the frequency of occurrence of code
iterating using indices over long strings and this does mean that we
absolutely need to add iterator caching in UTF-8 build which is why I'd
prefer to return to the question of performance of UTF-8 build after this
is done. I hope that if we cache ~2 positions in UTF-8 string it would be
enough to restore decent performance for most of the existing code.
Regards,
VZ
More information about the wx-dev
mailing list