Subject: Re: [boost] [review] Formal review period for Sort library continuing through November 19
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2014-11-20 13:03:53


Steven Ross wrote:

>> How is Get_char
>> supposed to ensure all by itself that all first characters of all book
>> titles are considered in the same radix pass, i.e. the one after the last
>> character of the longest author name?
>>
> By alternating between returning payload radix characters and a "does this
> field continue" byte. A 0 will specify an end of field, and sort lower in
> the order than a 1, thus making all shorter names sort lower than names
> that are equivalent to the same length but longer.
> Thus, Author: Jon Title: bat will have GetChar output these values:
> 1,'J',1,'o',1,'n',0 ,1,'b',1,'a',1,'t',0
>
> Author: Jonathan Title:(empty), will output these values:
> 1,'J',1,'o',1,'n',1,'a',1,'t',1,'h',1,'a',1,'n',0, 0
>
> The 0 vs 1 will cause Jon to appear before Jonathan in order, because it is
> shorter. Actually, the 0s and 1s aren't necessary for the last subkey (the
> title), but I included them to show this approach.
>
> There is also a simple GetLength functor that lets string_sort know when
> all characters are exhausted.
>
>>
>> By contrast, it is very clear how refining the comparison function would
>> be sufficient for a comparison sort. Just compare by author; in case of a
>> tie, compare by title.
>>
>
> That's exactly what the Get_char method I explained is doing, but it's less
> intuitive. The radix equivalent requires encoding the same information the
> comparison function uses, but sometimes in a different byte format to
> include factors such as sizes and signs. As I said, doable, but may not be
> worth the effort.
>
>
>> To the best of my knowledge, it is common wisdom that radix sorts are not
>> as general as comparison sorts. I can be wrong. I may have missed an
>> important (new?) insight. If that is the case, please explain to me how
>> refining the Get_char function is sufficient to deal with any strict weak
>> ordering (just repeating the claim does not help me understand).
>>
>> I agree with your statement that this is conventional wisdom.
> 2 major factors are included in this conventional wisdom:
> 1) Most Radix sorts perform poorly relative to comparison sorting for small
> N, large K, or variable-length data. Hybrid-radix sorting resolves this
> problem.
> 2) The mapping from a sorting requirement to a comparison function is more
> intuitive and generally simpler than the mapping to a Get_char-type radix
> implementation. With #1 limiting performance, and comparison-based sorts
> being fast enough for most problems, I think most people just didn't put in
> the effort to examine the possibility of generalized radix sorting.
> string_sort provides this capability. It won't be optimal for many types
> of problems, but it can be used for them and provides worst-case
> performance guarantees.

Thank you, Steven. It clicked. I guess I just needed a “explain like I’m
10” type of explanation.

Issue resolved, as far as I’m concerned.

Cheers,
Julian