$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [locale] Review results for Boost.Locale library
From: Artyom (artyomtnk_at_[hidden])
Date: 2011-04-24 17:04:41
> From: Ryou Ezoe <boostcpp_at_[hidden]>
> Number and Date formatting:
> There are so many possible ways to express numbers.
> Some people want comma separation by 3 digits, other want 4 digits.
> Some want to be 100ä¸(ä¸ means 10000). some want ç¾ä¸(ç¾ means 100)ã
> Formatting based on locale doesn't work because there is no uniform format.
>
Have you actually read the manuals?
This is the output of :
std::cout << bl::format("{1}\n{1,num}\n{1,spell}\n") % 1000000 ;
in ja_JP.UTF-8 locale
1000000
1,000,000
ç¾ä¸
Not so bad, isn't it?
> Collation and Conversions:
> Japanese doesn't have concepts of case and accent.
> Since we don't have these concepts, we never need it.
>
Irrelevant, even when this feature not required
for CJK it is required like many other things (spaces,
plural forms for other languages)
> Boundary analysis:
> What is the definition of boundary and how does it analyse?
> It sounds too smart for such a small things it actually does.
> I'd rather call it strtok with hard-coded delimiters.
> Japanese doesn't separate each words by space.
> So unless we perform really complicated natural language
> processing(which is impossible to be perfect since we never have
> complete Japanese dictionary),
> we can't split Japanese text by words.
Ok this is word splitting
|ç§|ã¯|æ¥æ¬|ã®|æ±äº¬é½|ã«|ä½|ãã§ãã¾ã|ã|ç§|ã¯|大|ããª|å®¶|ã«|ä½|ãã§ãã¾ã|ã
of the text:
ç§ã¯æ¥æ¬ã®æ±äº¬é½ã«ä½ãã§ãã¾ããç§ã¯å¤§ããªå®¶ã«ä½ãã§ãã¾ãã
I assume it is not perfect and I don't know Japanese to
say but I can see at lease that words like:
ç§ - I
æ¥æ¬ - Japan
æ±äº¬é½ - City of Tokyo
But this is not only defined by "space-based" separation.
Also for some languages like Thai ICU uses dictionaries.
So it is not naive algorithm that separates text by
spaces.
> Also, Japanese doesn't have a concept of word wrap.
> So "find appropriate places for line breaks" is unnecessary.
> Actually, there are some rules for line break in Japanese.
> These rules are too complicated and it requires more than text processing.
> Same for Chinese and Korean.
This is possible line-break separation of the same sentences above.
|ç§|ã¯|æ¥|æ¬|ã®|æ±|京|é½|ã«|ä½|ã|ã§|ã|ã¾|ãã|ç§|ã¯|大|ã|ãª|å®¶|ã«|ä½|ã|ã§|ã|ã¾|ãã|
At least I can see that it does not allows to start a line with "ã" .
>
> Of course, strtok is still a handy tool and I appreciate yet another design.
> But I think it's better be handled by more generic library, like Boost
> String Algorithms.
>
It far more complicated then strtok.
Bottom line I see that you hadn't really try
to use this library or understand how it
works.
I'm sorry but it makes me doubt about the review
you had sent.
Artyom