Subject: Re: [boost] Request interest in Bloom filters
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-03-23 09:16:15


vicente.botet wrote:
> I need for my application bloom filters. Do you know of a good boostified
> generic implementation?
> Is there any interest in such a generic bloom filters library in Boost?

Hi Vicente,

I have a simple generic Bloom filter here:

     http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh

This is GPL licensed but I'd be happy to change that to the Boost
license if someone wanted to make a Boost library using it.

After writing this code I eventually decided to use a different method,
so I'm not currently using this code anywhere.

The code is very straightforward. I spent much longer learning the
theory of Bloom filters than I did implementing it. Unfortunately I've
now forgotten much of the theory and all that is left is the code...

What's not included is hash functions; you have to supply these as a
template parameter. I used Boost.CRC and the hash function from std::
unordered containers. I seem to have lost the example code that did
this. One general problem with this approach is to find enough
independent hash functions. Say I'm storing strings and I need four
hash functions; the temptation is to have one function and to compute
e.g. f(x), f("123"+x), f("456"+x), f("789"+x). These will give
different answers and may look like they're independent functions, but
it's not really certain how independent they are.

Cheers, Phil.