$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Request interest in Bloom filters
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-03-23 09:16:15
vicente.botet wrote:
> I need for my application bloom filters. Do you know of a good boostified
> generic implementation?
> Is there any interest in such a generic bloom filters library in Boost?
Hi Vicente,
I have a simple generic Bloom filter here:
http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh
This is GPL licensed but I'd be happy to change that to the Boost
license if someone wanted to make a Boost library using it.
After writing this code I eventually decided to use a different method,
so I'm not currently using this code anywhere.
The code is very straightforward. I spent much longer learning the
theory of Bloom filters than I did implementing it. Unfortunately I've
now forgotten much of the theory and all that is left is the code...
What's not included is hash functions; you have to supply these as a
template parameter. I used Boost.CRC and the hash function from std::
unordered containers. I seem to have lost the example code that did
this. One general problem with this approach is to find enough
independent hash functions. Say I'm storing strings and I need four
hash functions; the temptation is to have one function and to compute
e.g. f(x), f("123"+x), f("456"+x), f("789"+x). These will give
different answers and may look like they're independent functions, but
it's not really certain how independent they are.
Cheers, Phil.