Boost mailing page: Re: [boost] Call for interest for native unicode character and string support in boost

Date view	Thread view	Subject view	Author view

From: Rogier van Dalen (rogiervd_at_[hidden])
Date: 2005-07-23 08:01:46

Next message: Joe Gottman: "Re: [boost] Question about shared_prt doc's"
Previous message: FlSt_at_[hidden]: "[boost] Interest for a perl6-like-junctions-class?"
In reply to: Graham: "[boost] Call for interest for native unicode character and string support in boost"
Next in thread: Erik Wien: "Re: [boost] Call for interest for native unicode character and string support in boost"
Reply: Erik Wien: "Re: [boost] Call for interest for native unicode character and string support in boost"

Hello Graham,

There was a student project aiming to produce a Unicode library, but I
didn't hear anything of it after the thread in
http://listarchives.boost.org/boost/2005/03/22580.php

There are loads of comments and ideas in that thread. Everyone wants a
Unicode library, but no-one seems to have enough time to write it
well. I again have been playing with the idea of trying to write a
library over the past few weeks.

You seem to be quite well versed in Unicode. My (hopefully
constructive) comments on your post:
First, are WORD and DWORD the Windows equivalents of uint16_t and
uint32_t, respectively?
I think the C++ way would be to ultimately leave the choice of
encoding to the user through a template parameter. This would, I
guess, do away with the assign* and insert* methods for various
encodings.
I think the normalisation form should be an invariant of the string as
well (and a template parameter). This makes it possible to implement
operator== and operator< as binary comparisons of codepoints, so that
they will be relatively fast (more so for UTF-8 and UTF-32 than for
UTF-16). People will surely want to use the string as a key for
std::map's, for example. Other more expensive collation methods
(including localised ones) could be implemented by different classes.
As far as the iterators are concerned, I believe the standard Unicode
string should contain grapheme clusters, and thus its iterator should
have this beast as its value_type (I would call it "character" because
as far as the Unicode standard and combining characters are concerned,
C++ programmers in general are "users", and grapheme clusters is what
they think of as characters).

Hope this helps.
Rogier

Next message: Joe Gottman: "Re: [boost] Question about shared_prt doc's"
Previous message: FlSt_at_[hidden]: "[boost] Interest for a perl6-like-junctions-class?"
In reply to: Graham: "[boost] Call for interest for native unicode character and string support in boost"
Next in thread: Erik Wien: "Re: [boost] Call for interest for native unicode character and string support in boost"
Reply: Erik Wien: "Re: [boost] Call for interest for native unicode character and string support in boost"

Date view	Thread view	Subject view	Author view