$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Peter Dimov (pdimov_at_[hidden])
Date: 2005-03-10 12:36:25
Dave Harris wrote:
> In-Reply-To: <018901c52569$ea842610$6601a8c0_at_pdimov>
> pdimov_at_[hidden] (Peter Dimov) wrote (abridged):
>> That would be bad because the hash value of the pair (x, y) would no
>> longer match the hash value of the sequence (x, y).
>
> Why is it necessary for pairs and sequences to have the same hash
> value?
> They are different things.
pair<int, int>, int[2], struct { int a; int b; } are different 
representations of the same thing; it is not strictly necessary for them to 
use the same algorithm, but it's desirable and will eliminate a source of 
confusion and mistakes.
> If it is necessary, we can avoid the problem by fixing hash_range:
>
>    template <typename It>
>    size_t hash_range( It first, It last ) {
>        if (first == last)
>            return 0x87654321; // Or some other value;
>        size_t seed = hash_value(*first);
>        for (++first; first != last; ++first)
>            seed = hash_combine( seed, *first );
>        return seed;
>    }
The simpler "initialize seed, modify seed with each element" approach is 
much cleaner. It can also be expressed in terms of
    void hash_range( size_t & seed, It first, It last );
should this function be added. You are welcome to spell the above as
    size_t hash_range( size_t seed, It first, It last );
and I understand your reasons for doing so, I just don't see side effects as 
so evil. Anyway, the good thing in this formulation is that you can hash a 
range in stages, if you only have access to a subrange at a time. The 
non-uniform treatment of empty sequences and the first element of the 
sequence would force you to keep state. State is even more evil than side 
effects. ;-)
> This also gives hash_range( &a, &a+1 ) the same result as
> hash_value(a), which your approach does not.
Yes, you are right, starting with a nonzero seed makes sequences of one 
element produce a different hash value than the element itself.
>> Do you have a different hash_combine in mind that doesn't suffer from
>> this problem and is better than the proposed implementation? ;-)
>> (Keeping in mind that even the current hash_combine is perceived as
>> slower than necessary, of course.)
>
> I had in mind something like:
>
>    template <typename T>
>    size_t hash_combine( size_t seed, const T &v ) {
>        const size_t c = 0x12345678; // Or some other value.
>        return seed ^ (hash_value(v) + c + (seed<<6) + (seed>>2));
>    }
>
> This costs one extra addition, which can probably be scheduled in
> parallel with other operations anyway.
This may be good enough.