Boost mailing page: Re: [bloom] Benchmarks with Knuth multiplier-based hash production

Date view	Thread view	Subject view	Author view

From: Joaquin M LÃ³pez MuÃ±oz (joaquinlopezmunoz_at_[hidden])
Date: 2025-06-11 16:33:37

Next message: Alexander Grund: "Re: [atomic] RFC: Bitwise operations for enums"
Previous message: Andrey Semashev: "Re: [atomic] RFC: Bitwise operations for enums"
In reply to: Ivan Matek: "Re: [bloom] Benchmarks with Knuth multiplier-based hash production"
Next in thread: Ivan Matek: "Re: [bloom] Benchmarks with Knuth multiplier-based hash production"
Reply: Ivan Matek: "Re: [bloom] Benchmarks with Knuth multiplier-based hash production"

El 10/06/2025 a las 14:52, Ivan Matek escribiÃ³:
> One more thing I noticed/checked and I believe may be of interest,
> again not in particular for this proposed Knuth multiplier change, but
> in general.
>
> Some filters have much weaker performance for mixed lookup.
> To be honest I have expected that for some(branch misprediction) but I
> do not know enough about all filters to say if slowdowns I have
> observed are as expected.
> Interesting that some fast_ have bad performance for mixed lookup,
> although naively I would assume SIMD code is branchless. I presume the
> issue is that sometimes we operate on more than 256 bits so there is
> branching...

fast_multiblock* has branches (early termination on lookup)
at k = 8, 16, ..., that is, whenever a full subblock (a pair of __m128is,
a __m256i, a pair of __m256is or a uint32x4x2_t, depending on the
subfilter and the SIMD technology) has been processed.

> What I find interesting beside this slowdown is that
> unordered_flat_set Â also has bad performance for mixed lookup so bloom
> filters that do not degrade on mixed lookup look much better than if
> we just compare 100%/0% cases.
> For example if we round performance to 1 decimal digit and compare
> unordered_flat set
> 4.7/3.1/*9.4*
> vs
> filter<int, 1ul, block<unsigned long, 7ul>, 1ul, hash<int>,
> allocator<int>, mcg_and_fastrange>
> 3.3/3.3/*3.3*
>
> comparison looks much much different if we drop mixed
> performance(bolded part).

Unlike fast_multiblock*, [multi]block is branchless, so your results
make sense. I wonder if multiblock<K> should be made branchful as
the working memory area can get quite large for large values of K.

Joaquin M Lopez Munoz

Next message: Alexander Grund: "Re: [atomic] RFC: Bitwise operations for enums"
Previous message: Andrey Semashev: "Re: [atomic] RFC: Bitwise operations for enums"
In reply to: Ivan Matek: "Re: [bloom] Benchmarks with Knuth multiplier-based hash production"
Next in thread: Ivan Matek: "Re: [bloom] Benchmarks with Knuth multiplier-based hash production"
Reply: Ivan Matek: "Re: [bloom] Benchmarks with Knuth multiplier-based hash production"

Date view	Thread view	Subject view	Author view