String encoding problems
Stephan Rose
kermos at somrek.net
Mon Dec 3 13:17:27 PST 2007
On Mon, 2007-12-03 at 21:52 +0100, Vadim Zeitlin wrote:
> On Mon, 03 Dec 2007 10:17:17 +0100 Stephan Rose <kermos at somrek.net> wrote:
>
> SR> So then the other day, I check out windows and realize windows does
> SR> 16-bit for wchar_t. Now the above no longer holds true, nor does my file
> SR> IO code like this as it expects wchar_t to be 32-bit.
>
> Why should it expect this?
Simply because when I implemented the code, wchar_t happened to be
32-bit and I wasn't aware yet that this changes from platform to
platform. To me, a 16-bit wchar_t makes little sense as it cannot fully
hold all unicode values. I suppose though this is precisely why
Microsoft chose 16-bit for it on their platform...
>
> Anyhow, whatever you use internally, I do strongly recommend using UTF-8
> for anything external to your program as this becomes a de facto standard
> and doing anything else is more complicated (wxConvAuto does some of this
> for you though).
Yup, that is what I am going to be doing. Not a big deal for me to
change my file format right now. =)
>
> SR> I suppose I can save / load via utf-8 but I still have the problem after
> SR> loading with my internal representation.
>
> If by problem you mean lack of surrogates support, then I'm afraid the
> usual solution to this problem in the Windows world is to just ignore it.
Well since I'm cross platform, I don't want to run into the problem
where one thing works on one platform and not on the other and as a
result have issues between files saved on different platforms. For
instance, if a linux user saves a file that happens to have surrogates
and then this gets loaded under windows and I ignore it under windows,
things are going to bad.
I don't really want to be as ignorant as Microsoft is and simply just
pretend that surrogates don't exist. I really don't want to do that.
Especially not in an engineering related application which is more
likely to have text with special characters and more likely to possibly
have surrogates.
>
>
> SR> Using wxString isn't really an option in those areas as I'm trying to
> SR> keep most of the affected code api independent.
>
> Err, well this is a laudable idea but it's difficult to recommend anything
> not involving wxString on this list...
Thanks, and honestly I'd not want to use other API's other than xWidgets
anyways. That would break my goal of API independence so it'd make
little sense. I could just use wxString in that case, it is a rather
nice class after all. :)
I mean don't get me wrong, I really like wxWidgets and I enjoy using it.
It's got to be the best C++ API I've used so far. But past experiences
have made my very wary of being tied to one particular API (over 50,000
lines of useless .Net Framework code come to mind when I discovered that
the Framework ultimately won't meet my performance needs).
So for that reason, I'm trying to keep everything that doesn't need an
API for cross platform or GUI reasons API free.
I started implementing my own little String class for use outside of my
wxWidgets related code and that seems to work pretty well. I already
implemented the various UTF8/16/32 conversions and it seems to all work
pretty well. At least the UTF8/32 conversions I have tested. UTF16 will
have to wait until I test under windows. It uses UTF32 internally so I
shouldn't have any problems with surrogates.
Thanks :)
Stephan
More information about the wx-users
mailing list