$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [Feedback] Towards a Better Boost.BloomFilter
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-08-30 16:13:21
Alejandro Cabrera wrote:
> Phil,
>
> Phil Endecott-48 wrote:
>>
>> It is normally preferable to pass a begin-end pair rather than address
>> and length, but fundamentally yes I would like to be able to construct
>> a read-only bloom filter from a pair of const_iterators i.e. const
>> pointers in this case.
>>
>
> The input iterator constructor is already available.
No, I don't think so. What you have is a ctor that iterates over a
sequence of values and inserts them:
template <typename InputIterator>
basic_bloom_filter(const InputIterator start, const InputIterator
end) {
for (InputIterator i = start; i != end; ++i)
this->insert(*i);
}
What I want is a way to adapt a chunk of raw memory, which I've got
from mmap(), and to treat it as a bloom filter. I haven't yet seen any
"adaptor" functionality in your code.
> All that remains is what you describe below.
>
> Phil Endecott-48 wrote:
>>
>> I don't care about serialisation. I just want to be able to
>>
>> const T* p = &(*(bloom_filter.data().begin()));
>> size_t len = sizeof(T) * bloom_filter.data().size();
>> write(fd,p,len);
>>
>> Regards, Phil.
>>
>
> All this should take is changing the underlying storage type to something
> that supports the operations begin() and size(). The intent is to use
> std::vector, as with the dynamic counting Bloom filter implementations. The
> operations you've described above are serialization. This will be supported
> soon, as soon as I can work on the basic Bloom filters. I'll add a test to
> my suite to make sure there is a way to support this in both directions
> (write/read, store/load).
I really don't want to "load" the filter i.e. to do something that
needs O(N) time and O(N) memory. I just want to mmap the file into
memory and "declare" the bloom filter into existence in O(1) time.
Then when I eventually use it, pages will be faulted-in on demand. The
aim is to ensure that I can have a perhaps very large filter and yet to
perform small numbers of queries quite quickly.
Regards, Phil.