$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Request interest in Bloom filters
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-03-23 09:16:15
vicente.botet wrote:
> I need for my application bloom filters. Do you know of a good boostified 
> generic implementation?
> Is there any interest in such a generic bloom filters library in Boost?
Hi Vicente,
I have a simple generic Bloom filter here:
     http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh
This is GPL licensed but I'd be happy to change that to the Boost 
license if someone wanted to make a Boost library using it.
After writing this code I eventually decided to use a different method, 
so I'm not currently using this code anywhere.
The code is very straightforward.  I spent much longer learning the 
theory of Bloom filters than I did implementing it.  Unfortunately I've 
now forgotten much of the theory and all that is left is the code...
What's not included is hash functions; you have to supply these as a 
template parameter.  I used Boost.CRC and the hash function from std:: 
unordered containers.  I seem to have lost the example code that did 
this.  One general problem with this approach is to find enough 
independent hash functions.  Say I'm storing strings and I need four 
hash functions; the temptation is to have one function and to compute 
e.g. f(x), f("123"+x), f("456"+x), f("789"+x).  These will give 
different answers and may look like they're independent functions, but 
it's not really certain how independent they are.
Cheers,  Phil.