$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [gsoc] Request Feedback for Boost.Ustr Unicode String	Adapter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-09 10:54:08
Soares Chen Ruo Fei wrote:
> A while ago I gave some previews of my Unicode String Adapter library
> to the boost community but I didn't receive much feedback. Now that
> GSoC is ending I'd like you all to take a look at my project again and
> provide feedback on the usefulness of the library. Following are the
> links to my project repository and documentation:
>
> GitHub repository: https://github.com/crf00/boost.ustr
> Documentation: http://crf.scriptmatrix.net/ustr/index.html
I think there are probably as many ways to implement a "better" string 
as there are potential users, and previous long discussions here have 
considered those possibilities at great length.  In summary your 
proposal is for a string that is:
- Immutable.
- Reference counted.
- Iterated by default over unicode code points.
- Provides access to the code units via operator* and operator->, i.e.
     s.begin()  // Returns a code point iterator.
     s->begin() // Returns a code unit iterator.
I won't comment about the merits or otherwise of those points, apart 
from the last, where I'll note that it is not to my taste.  It looks 
like it's "over clever".  Imagine that I wrote some code using your 
library, and then a colleague who was not familiar with it had to look 
at it later.  Would they have any idea about the difference between 
those two cases?  No, not unless I added a comment every time I used 
it.  Please let's have an obvious syntax like:
     s.begin()       // Code points.
     s.impl.begin()  // Code units.
  or s.units_begin() // Code units.
Personally, I don't want a new clever string class.  What I want is a 
few well-written building-blocks for Unicode.  For example, I'd like to 
be able to iterate over the code points in a block of UTF-8 data in raw 
memory, so some sort of iterator adaptor is needed.  Your library does 
have this functionality, but it is hidden in an implementation detail.  
Please can you consider bringing out your core UTF encoding and 
decoding functions to the public interface?
I would also like to see some benchmarks for the core UTF conversion 
functions.  If you post some benchmarks that decouple the UTF 
conversion from the rest of the string class, I will compare the 
performance with my own code.
Regards,  Phil.