$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Joel de Guzman (joel_at_[hidden])
Date: 2006-07-05 21:23:58
Sean Parent wrote:
> I don't have enough time to delve deeply into this thread but I  
> thought I'd make a few passing comments.
> 
> Adobe has a fairly major string class problem (we joke that every  
> project must have it's own string class - which is nearly true).  
> There isn't such thing as a single type of string - there are _many_  
> purposes and you need to be able to handle things like language and  
> style runs and large, large blocks of text with efficient edits, UI  
> substations (which are aware of things like split negation and  
> masculine/feminine forms), language based ordering, different  
> encodings...
> 
> We need another string class like a hole in the head.
> 
> What we do need - are good standard algorithms which can be applied  
> to any string class.
> 
> I believe this is doable with the current iterator interface.
> 
> I believe it's possible (meaning I've done some quick experiments) to  
> define an input iterator (actually as strong as a non-mutating  
> forward iterator) and output iterator, which do conversions. This  
> means that you can define operations in terms of unicode encoding  
> (though some operations such as ordering may still require a locale).
> 
> Consider -
> 
> to_lower(first, last, output)
> to_upper(first, last, output)
> 
> such transformations can work with any encoding (you can uppercase  
> UTF-8 into UTF-32). They can't work in-situ (but I don't think  
> to_upper or to_lower really can work in-situ - certainly not in UTF-8  
> and probably not in UTF-16, and I believe there are some multi- 
> character forms that even break in UTF-32...). It is possible though  
> to wrap them with a replace function for in-place operations.
> 
> The current std::find() will work with such iterator adapters to find  
> single UTF-32 character (in any encoded sequence).
> 
> Currently with ASL we're taking such an approach for localization  
> strings (replacing an existing string class for localized strings at  
> Adobe with a small set of functions and _any_ string class (any  
> sequence of code units), including std::string, std::vector (or deque  
> or list).
> 
> You might take a look here for some ideas: <http:// 
> opensource.adobe.com/group__asl__xstring.html>.
This is very close to what I have in mind. The main difference is that
the functions/algorithms in my mind take ranges instead of iterators.
Thus:
     to_lower(src, dest)
     to_upper(src, dest)
With these, I could make Fusion like wrappers that transform them into
something like:
     some_string s1 = to_lower(src);
     some_string s2 = to_upper(src);
where to_lower and to_upper return cheap views that are in and by
themselves valid strings/ranges. They are cheap because the actual
conversions/transformations are done on demand-- think lazy evaluation.
So, like those done by expression template techniques, there are
no expensive temporaries when you perform seemingly expensive tasks
like:
     some_string s = f1(f2(f3(f4(src))));
And yes, because they are generic, those string algorithms can work
on any string type that satisfy some basic requirements.
Regards,
-- Joel de Guzman http://www.boost-consulting.com http://spirit.sf.net