From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-05-22 17:09:23


El 22/05/2025 a las 16:29, Vinnie Falco via Boost escribió:
> [...]
>
> Documentation is a mixed bag. I can't seem to find very important things
> like the minimum version of C++ required, or which compilers it supports,
> like I can find here:
>
> https://www.boost.org/doc/libs/latest/libs/json/doc/html/json/overview.html#json.overview.requirements

Peter Turcan also asked for a getting started section along these lines.
I will add it
as a subsection to the introduction.

> The content of the documentation is high quality but the layout is in my
> opinion terrible. I find it hard to read these class synopses and the
> styling is an impediment to legibility. This is not a problem specific to
> Boost.Bloom. Hopefully some of the work the C++ Alliance is doing in the
> area of documentation will empower authors to do better.

My plan is, if the library is accepted, to ask for help to port this to
Antora
(aka Boost.Unordered style).

> The inclusion of the Visual Studio visualizer ("natvis file") is a very
> nice touch and an indicator of quality. Hopefully we will see the
> corresponding gdb pretty printer. Or perhaps someone might contribute it.

I already have in mind who I'll ask to do this :-)

> Heads up, I am quite familiar with Bloom filters in terms of implementation
> experience although Joaquin is way ahead of me on the math and
> optimization. Bloom filters are very common in decentralized network
> programs such as Gnutella, BearShare, LimeWire, and now bitcoind. For
> example, thin bitcoind nodes can provide their directly connected peers
> with a compressed bloom filter, to reduce broadcast message volume to the
> subset of interested transactions:
>
> https://bitcoinops.org/en/topics/transaction-bloom-filtering
>
> Before the review period started, I had mentioned at least twice that it
> would be nice to see a demo of Boost.Bloom data structures patched into
> some other popular software which already uses these filters. A quick
> review of the repository turns up these leads which could provide for
> interesting study:
>
> https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34a4dd/src/leveldb/util/bloom.cc#L17
> https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34a4dd/src/common/bloom.h#L108
> https://github.com/bitcoin/bitcoin/blob/35bf3f88398da0211cece0a29da4de001e34a4dd/src/common/bloom.cpp#L162
>
> The "rolling bloom filter" referenced above is a slightly different data
> structure from what Boost.Bloom offers, and I can't help but wonder if this
> "CRollingBloomFilter" can be implemented in terms of Boost.Bloom. If the
> Boost.Bloom powered version of bitcoind proves more efficient than its
> current implementation, it will be adopted by that project as such
> efficiencies contribute to increased decentralization.

In fact I'm somewhat surprised most reviewers haven't asked for variations
of/alternatives to Bloom filters. This can be added to the roadmap if
the library
gains traction, I guess.

> The example in the README could be improved by declaring the filter thusly:
>
> using filter = boost::bloom::filter<boost::core::string_view, 5>;
>
> I don't think it is a good idea to give users an example which performs
> unnecessary copying or allocation.

This is not needed cause the filter (with the exception of emplace) does
not create
elements on its own --you comment on this later in the review.

> I'm puzzled about the terminology. "insert" and "emplace" imply that new
> nodes or array elements of type value_type are being stored. This is not
> the case though, as these objects are hashed and the resulting digest is
> used to algorithmically adjust the filter which is in essence an array of
> bool. `emplace` documentation says that it constructs elements.

The interface mimics that of a container as a way to lower the entry barrier
for new users learning it. But filters are _not_ containers, as you've found
out. Claudio also mentioned this. I'll stress the point in some
prominent place in
the documentation.

BTW, emplace() does construct the element, calculates its hash, and then
throws
it away.

> This talk
> of subfilters is confusing and sounds like an implementation detail. And I
> trust that whatever is there is necessary, as Joaquin is not prone to
> frivolities.

I'll expand on this in the docs. Subfilters are the way the library
allows users
to configure the filter according to different FPR/speed/size tradeoffs.
Ruben
Perez and Ivan Matek also requested more clarity in this area.

> I think that the array() function(s) need to guarantee that bits beyond the
> capacity but present in the storage will be zeroed out. This way they can
> be serialized deterministically.

Not sure I'm getting you, but the extent of the span returned by array()
is exactly
as wide as capacity() indicates.

> I looked through the implementation, the optimizations are a bit beyond me
> although it appears to be well thought out and the benchmarks are nice. I
> ran the examples and I set some breakpoints and stepped into the code,
> where I understood that insert and emplace do not store the elements yet we
> are constructing them anyway which is disappointing for a string_view
> aficionado such as myself.
>
> I suppose this review can mostly be considered feedback and not any
> meaningful opposition to its inclusion in Boost, therefore my vote is to
> ACCEPT this library. Joaquin has a track record of perpetual improvements
> to the libraries he works on so even if there is an identified shortcoming,
> I doubt it will last long. I appreciate the work that went into the library
> and I think it will do well in its domain.

Thanks for your review Vinnie!

Joaquin M Lopez Munoz