$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Bill Somerville (bill_at_[hidden])
Date: 2008-04-03 18:16:59
Robert Dailey wrote:
> On Thu, Apr 3, 2008 at 4:16 PM, Robert Dailey <rcdailey_at_[hidden] 
> <mailto:rcdailey_at_[hidden]>> wrote:
>
>     On Thu, Apr 3, 2008 at 3:47 PM, Steven Watanabe
>     <watanabesj_at_[hidden] <mailto:watanabesj_at_[hidden]>> wrote:
>
[...snip...]
>
>         > Also, the definition of 'bucket' eludes me. If someone could
>         explain
>         > what buckets are in relation to unordered associative
>         containers I'd
>         > appreciate it.
>
Buckets containers for the elements with equal hash values.
>
>
>         Does the following (very naive) example implementation give
>         some idea of
>         how a hash table works?
>
>         const int num_buckets = 167;
>         std::vector<int> buckets[numBuckets];
>
>         void insert(int x) {
>            buckets[hash(x) % numBuckets].push_back(x);
>         }
>
Note that insert doesn't overwrite bucket - it adds the new element to 
the bucket which is a vector.
>
>
>         In Christ,
>         Steven Watanabe
>
>
>     According to your example, what happens if I do:
>
>     insert(167);
>     insert(334);
>
>     ???
>
>
>
> The above is assuming that the hash() function does:
>
> hash(x) { return x; }
The hash function maps an element key with a number, all equal keys must 
produce the same number, hash functions are a many to one mapping usually.
> insert(167);
> insert(334);
>
> the above code (second insert) would cause the same element in the 
> vector to be overwritten. The math is a little bit odd too. It doesn't 
> really answer a lot of questions. The inserted values may also be 
> strings as well, however in your example it appears as if the hash 
> function only handles integral values.
Elements are distributed across the buckets, if more than one element 
maps to the same bucket then that is termed a collision. If the hash 
function is perfect then each bucket will contain 1 element when the 
hash map is full and lookup is O(1), too many elements (or too few 
buckets) or an imperfect hash function cause multiple elements in one or 
more buckets; tending to O(n) lookup (like a vector) in the worst case 
where all elements are in one bucket.
Choose a combination of hash function and number of buckets that gives 
the required performance.
HTH
-- Bill Somerville Class Design Limited