$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Re quest interest in Bloom filters
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2009-03-23 12:15:01
Vicente Botet Escriba wrote:
> Phil Endecott-48 wrote:
>>
>> vicente.botet wrote:
>>> I need for my application bloom filters. Do you know of a good boostified
>>> generic implementation?
>>> Is there any interest in such a generic bloom filters library in Boost?
>>
>> Hi Vicente,
>>
>> I have a simple generic Bloom filter here:
>>
>> http://svn.chezphil.org/libpbe/trunk/include/bloom_filter.hh
>> I seem to have lost the example code
Now found and checked in here:
http://svn.chezphil.org/libpbe/trunk/examples/spellcheck.cc
> The hard part (the hash functions) is left out quite elegantly. I was
> thinking about defining some hash traits wich will allow the user to set the
> hash functions in a independent way. Something like
>
> template <template T, size_t N> struct hash_traits;
>
> template <> hash_traits<SpecificUserType,0> {
> static unsigned hash(T& value) {
> // specific code
> }
> }
>
> BTW, besides the tr1::hash function, are there other hash generic functions
> in Boost?
You can use Boost.CRC. What sort of key do you have? I once did some
investigation of hash functions for strings for use with unordered
containers; I was considering using a degenerate hash that only looks
at the first N characters.
Something I didn't mention before is that if you need, say, 10-bit hash
values then a 32-bit CRC could give you three of them. In this case
your hash_traits would presumably compute the same CRC three times and
extract different portions each time, which is not ideal.
> I would need also the union and intersection functions. I'm wondering if
> replacing the tr1::array by a boost::dynamic_bitset should make eassier
> this.
Well it's not exactly difficult with the array. I've never used
dynamic_bitset and being dynamic is not a requirement here. I was
concerned (as normal) about getting a fast implementation and the
simple bit manipulation in my code is probably about as fast as you can get.
Cheers, Phil.