Subject: Re: [boost] Interest in a Bloom Filter?
From: Milosz Marian HULBOJ (mhulboj_at_[hidden])
Date: 2009-06-15 07:21:29


Dean Michael Berris wrote:
> Hi Milosz,
>
> On Mon, Jun 15, 2009 at 2:49 PM, Milosz Marian HULBOJ<mhulboj_at_[hidden]> wrote:
>> - It would be nice to have the option of using `double-hashing' schema,
>> where all the hashes are generated according to:
>> hash(i) = h1(x) + i*h2(x)
>> or the squared/cubed variant:
>> hash(i) = h1(x) + i*h2(x) + i*i (*i)
>>
>
> This is interesting. I'd think this would have to be implemented by
> the user of the bloom_filter in the HashFunctions template parameter.
>
>> - one hash value can be potentially used to generate many hash values. For
>> example, if you need 16-bit hashes, then from the hash function producing
>> 32-bit hash-value you could get 2 values. This can be in turn combined with
>> the previous point - double hashing.

Hi Dean,

The bottom line of the two points above is that in order to hash for the
bloom filter (in the double-hashing schema) one only needs to:
- invoke hash function once
- split the bits and do few multiplications and additions
Still I didn't have time to look into yours implementation to see
whether the above can be used seamlessly.

There are also variants of double hashing to be more cache-line
friendly, but personally the idea presented above was sufficient for my
needs (for the hashing functions I've been using the TR1 hash and murmur
hash).

Regards,
Milosz