$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [string] Realistic API proposal
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-01-31 04:22:49
> From: Mathias Gaunard <mathias.gaunard_at_[hidden]>
> Subject: Re: [boost] [string] Realistic API proposal
> 
> On 30/01/2011 08:46, Artyom wrote:
> >>
> >> If my strings   are valid and normalized, I can compare them with a simple
> >>  binary-level  comparison;
> >> likewise for substring search, where  I may also need to add a  boundary 
>check
> >> if I want fine-grain  search.
> >>
> >
> > No you can't
> >
> > For example  when you search word ש××× you want to find שָ××Ö¹× as well (with
> >  diactrics)
> > that are not normalized.
> 
> Unless I understand that  wrong, they're as equal as e is equal to é or a 
> is equal to  à.
> 
Yes, with small exception that "שָ" is NFC form that consists of two code points
for "base latter" and "vowel mark" which should be equal to "ש" the "base 
letter",
unlike "à" which has one code point in NFC form like "a".
> 
> >
> > Search and Collation require much more complicated  levels comparison.
> 
> Right, I'm talking about exact comparison, not  collation.
> Exact comparison is what you use in most text processing and  parsing.
> 
> You can perform collation folding with the right level if you  want those 
> two strings to compare equal.
> 
> 
> >
> > The problem  that I may want 00e0 (à) and 0061 0300 (a  + `) and 0061 (a) to 
>be
> >  equal for string
> > search as well.
> 
> You may, but that should not be  the default behaviour of operator== and 
> operator<.
> 
The default behavior is binary comparison, but this is not what I'm
looking for I'm looking for search/comparison algorithm that can
see "à" and "a" and "שָ" and "ש" as equal.
Artyom