Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-24 17:04:41


> From: Ryou Ezoe <boostcpp_at_[hidden]> > Number and Date formatting: > There are so many possible ways to express numbers. > Some people want comma separation by 3 digits, other want 4 digits. > Some want to be 100万(万 means 10000). some want 百万(百 means 100)。 > Formatting based on locale doesn't work because there is no uniform format. > Have you actually read the manuals? This is the output of : std::cout << bl::format("{1}\n{1,num}\n{1,spell}\n") % 1000000 ; in ja_JP.UTF-8 locale 1000000 1,000,000 百万 Not so bad, isn't it? > Collation and Conversions: > Japanese doesn't have concepts of case and accent. > Since we don't have these concepts, we never need it. > Irrelevant, even when this feature not required for CJK it is required like many other things (spaces, plural forms for other languages) > Boundary analysis: > What is the definition of boundary and how does it analyse? > It sounds too smart for such a small things it actually does. > I'd rather call it strtok with hard-coded delimiters. > Japanese doesn't separate each words by space. > So unless we perform really complicated natural language > processing(which is impossible to be perfect since we never have > complete Japanese dictionary), > we can't split Japanese text by words. Ok this is word splitting |私|は|日本|の|東京都|に|住|んでいます|。|私|は|大|きな|家|に|住|んでいます|。 of the text: 私は日本の東京都に住んでいます。私は大きな家に住んでいます。 I assume it is not perfect and I don't know Japanese to say but I can see at lease that words like: 私 - I 日本 - Japan 東京都 - City of Tokyo But this is not only defined by "space-based" separation. Also for some languages like Thai ICU uses dictionaries. So it is not naive algorithm that separates text by spaces. > Also, Japanese doesn't have a concept of word wrap. > So "find appropriate places for line breaks" is unnecessary. > Actually, there are some rules for line break in Japanese. > These rules are too complicated and it requires more than text processing. > Same for Chinese and Korean. This is possible line-break separation of the same sentences above. |私|は|日|本|の|東|京|都|に|住|ん|で|い|ま|す。|私|は|大|き|な|家|に|住|ん|で|い|ま|す。| At least I can see that it does not allows to start a line with "。" . > > Of course, strtok is still a handy tool and I appreciate yet another design. > But I think it's better be handled by more generic library, like Boost > String Algorithms. > It far more complicated then strtok. Bottom line I see that you hadn't really try to use this library or understand how it works. I'm sorry but it makes me doubt about the review you had sent. Artyom