Subject: Re: [boost] [Feedback] Towards a Better Boost.BloomFilter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-25 17:27:03


Hi Alejandro,

Alejandro Cabrera wrote:
> Hello all,
>
> The GSoC coding period is over. However, I have every intent to continue to
> develop, improve, and maintain the Boost.BloomFilter package.
>
> Your feedback last time around was very useful, and made me think very
> carefully about the operations supported by each Bloom filter variant
> implemented. Thank you.
>
> Now, I could use feedback on at least the following aspects:
> - Are all the operations you need implemented by Boost.BloomFilter?

There is still no "adaptor" functionality, so I can't mmap() a file to
use as the bloom filter's raw content, as I might want to do for e.g. a
spellcheck, URL blocklist, etc. etc.

> Serialization support is still missing, though at least now it is possible
> to access the underlying storage for each structure implemented through the
> data() member function.

data() returns a std::bitset, but that doesn't provide access to its
data in a form that I can write to a file (e.g. in preparing the data
for the above examples). I consider this a fault of std::bitset. I
believe you should use a std::vector or array instead.

I see that you still have intersection without discussing the error
rate implications.

Your dynamic_bitset has a resize() method. Was that there before? I
don't understand how it can work in any useful way.

The final documentation should mention the complexity of each method.

Regards, Phil.