Boost users' mailing page: Re: [Boost-users] wregexundefined workaround

Date view	Thread view	Subject view	Author view

From: pps (i-love-spam_at_[hidden])
Date: 2005-03-13 23:24:41

Next message: pps: "Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"
Previous message: Jeff Garland: "[Boost-users] [wiki] New content filtering to reduce spam"
In reply to: John Maddock: "Re: [OBORONA-SPAM] Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"
Next in thread: pps: "Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"
Reply: pps: "Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"

> The regex lib doesn't get much bigger, it's the dependency to ICU that
> gets you :-)
>
> I suggest that you read the traits class docs, and then use
> c_regex_traits as an example to work from.
>
> John.

Woo-hhoo, I managed to compile new regex with icu, but I think it's too
complicated for such little functionality that I need. I don't really do
something serious - as a lesson to study boost regex I wanted to write a
simple app that takes regular expressions from javascript ( that are in
form /regex/im only) and writes out cpp source code that using boost
regex does string match and returns bool. I assume that the input string
is utf16 (without possibility of extra 2 bytes, just like in javascript).
Everything was done and tested, until I tried it on freebsd where wregex
didn't exist and where sizeof wchar_t is different from vc_71.
Easier way to get this functionality is to rip off the regex part from
spidermonkey (embeddable js engine) that borrows regex part prom perl as
far as I know, or even easier just to embed it and use for completely
compatible regex match; But I don't need easy routes :)
I tested with javascript - it does a good job with wide strings also.
For example: /^\u03C6+$/i or /^\w+$/i will match "Φφ" correctly
recognizing upper and lower case for greek PHi. I don't really need this
*fancy* handling for chars over 0x7F. The entire javascript engine in a
static lib is less than 2M, so ICU seems a bit heavy weight for simple
functionality. The only extra thing I want to add over usual
boost::regex is to be able to use \xHHHH or \uHHHH and that it would
operate on 16-bit characters.
I looked for c_regex_traits and couldn't find this class - I found a lot
of specializations for this template in different places (like
template<> c_regex_traits ...).

Next message: pps: "Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"
Previous message: Jeff Garland: "[Boost-users] [wiki] New content filtering to reduce spam"
In reply to: John Maddock: "Re: [OBORONA-SPAM] Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"
Next in thread: pps: "Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"
Reply: pps: "Re: [OBORONA-SPAM] Re: [Boost-users] wregexundefined workaround"

Date view	Thread view	Subject view	Author view