RegEx for Unicode

Fabian Cenedese Cenedese at indel.ch
Thu Feb 1 00:47:46 PST 2007


Hi

This is a question about a regex and may be considered offtopic. But
I'm using the inbuilt implementation with Unicode and was wondering
about an optimization.

I want to evaluate the Chinese dictionary file from http://www.mandarintools.com/cedict.html

The lines are of the form:
Traditional Simplified [pinyin] /English equivalent 1/equivalent 2.../

The first words are in Chinese HanZi so [a-zA-Z] won't work. I came
up with this that works:

wxT("(.*) (.*) \\[(.*)\\] /(.*)/$")
or also
wxT("([^ ]*) ([^ ]*) \\[(.*)\\] /(.*)/$")

But are there better methods for working in Unicode except .*
and [^ ]* for foreign languages/chars?

Thanks

bye  Fabi






More information about the wx-users mailing list