[wx-dev] Improving wxString proposal (Re: #9672: wx 2.9 Ansi/Unicode Combi-wxString)

Vadim Zeitlin vadim at wxwidgets.org
Tue Jul 8 06:19:23 PDT 2008


On Tue, 08 Jul 2008 15:12:40 +0200 Robert Roebling wrote:

RR> These two sentences would make believe that UTF-8 support
RR> has anything to do with the removal of the ANSI build, but
RR> it obviously has not. The ANSI build could have been removed
RR> by adding the various char* overloads, automatic conversions
RR> and c_str() magic totally without any UTF-8 changes.

 Yes, this is correct. ANSI/Unicode neutral API and UTF-8 internal
representation are orthogonal except that the former is practically
required by the latter so it made sense to implement both at the same time.

RR> It would have been correct to write:
RR> 
RR> a) We want to abandon the ANSI build mode by providing various
RR>    char * overloads, automatic conversions and c_str() magic
RR>    and thus make wxWidgets always Unicode aware with a maximum
RR>    of (but ot total) backwards compatibility.
RR> 
RR> b) We separately want to use UTF-16 as the internal storage format
RR>    for wxString under Windows (as before) and UTF-8 under Linux
RR>    and OS X (in contrast to UCS4) with an identical interface in
RR>    either case.
RR> 
RR> c) Using UTF-8 under Linux may sometimes have a beneficial effect
RR>    on performance since conversion to and from UTF-8 before calling
RR>    GUI functions (GTK+) as well as CRT functions (glibc) will no
RR>    longer be required and memory requirements per string will also
RR>    often be reduced compared to storing strings in UCS4.
RR> 
RR> d) For people requiring O(1) access to Unicode strings under Linux
RR>    and OS X, the library can still be compiled in wchar_t/UCS4 mode.
RR>    Alternatively, strings can be converted to std::wstring and then
RR>    processed further (and later hopefully also to char16_t based
RR>    strings).

 This is a good summary.

RR> The main point of critique is that d) wasn't required before
RR> and little data suggest that c) is relevant. Indeed, it most
RR> likely, it is not.

 I admit that I underestimated the frequency of occurrence of code
iterating using indices over long strings and this does mean that we
absolutely need to add iterator caching in UTF-8 build which is why I'd
prefer to return to the question of performance of UTF-8 build after this
is done. I hope that if we cache ~2 positions in UTF-8 string it would be
enough to restore decent performance for most of the existing code.

 Regards,
VZ



More information about the wx-dev mailing list