[wx-dev] wxRegEx url matching
Michael Wetherell
mike.wetherell at ntlworld.com
Tue Feb 5 02:33:01 PST 2008
On Tuesday 05 February 2008 09:35, Vadim Zeitlin wrote:
> On Tue, 05 Feb 2008 09:02:37 +0100 Robert Roebling wrote:
>
> RR> > Robin Dunn <robin at alldunn.com> wrote:
> RR> >
> RR> > RD> I got a little confused along the way because apparently
> wxRegEx doesn't RR> > RD> support the common \w character class
> RR> >
> RR> > It does when using the built-in version, see
> src/regex/re_syntax.n. RR> > You need to use it in advanced mode,
> i.e. with wxRE_ADVANCED flag for this. RR>
> RR> This is not entirely obvious and mentioning it in the docs would
> RR> probably be a good idea.
>
> Yes, but then we really need some RE syntax description. I'd prefer
> to just put in a link to some page somewhere but I didn't find
> anything in 30 seconds of googling. Maybe we can put re_syntax man
> page on wxwigdets.org and link there?
There is already:
http://wxwidgets.org/manuals/stable/wx_wxresyn.html#wxresyn
The docs on wxRegEx do already link to it, but improvements are welcome
of course.
> RR> Does \w in the built-in library know about Unicode or is this
> ascii or RR> ISO-8859 only?
>
> To be honest I have no idea. I tried reading src/regex/regc_locale.c
> but got lost, it would be probably simpler to just test it.
Yes it supports Unicode. From the docs:
"Unicode is fully supported only when using the builtin library. When
using the system library in Unicode mode, the expressions and data are
translated to the default 8-bit encoding before being passed to the
library."
"wxWidgets: In a non-Unicode build, these character classifications
depend on the current locale, and correspond to the values return by
the ANSI C 'is' functions: isalpha, isupper, etc. In Unicode mode they
are based on Unicode classifications, and are not affected by the
current locale."
Regards,
Mike
More information about the wx-dev
mailing list