From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-06-15 16:44:52


El 14/06/2025 a las 19:32, Ivan Matek escribió:
> One more datapoint that confirms what you suggested(and what
> intuitively makes sense), that for
> large K effect of breaking early for unsuccessful lookups looks more
> impressive(when filter fits in L2).
>
> [...]
>
> There is also another interesting fact about performance that will
> take a bit to explain so I ask for patience:
> fast_multiblock<8> is much faster than fast_block<11> for
> unsuccessful_lookup_time, this does not
> initially make sense because we would expect we execute same amount of
> instructions, and branch is
> easily predictable since 100% of lookups fail(beside FPR).
> I am not asm expert, but it seems clang interleaves SIMD for for 11
> case so some stuff that is not strictly
> necessary before branch is done before branch. Now this is great when
> we do not branch(early return)
> but is waste when we have close to 100% unsuccessful lookups.

Yes, this is odd. If your thesis is correct, the budget optimizer of the
compiler seems
to have decided to go branchless despite what the code says.

> This made me investigate a bit and on my machine I can get really
> great performance for
> unsuccessful_lookup_time when I combine SIMD and scalar filter for
> "remainder"
> (e.g. SIMD 8 , scalar 3 for K==11).

This is an idea that deserves some investigation. When K%8 is small,
going scalar could
actually be faster than SIMD. This is what your experiments seem to
suggest anyway.

>
> This combined filter(bobw_filter) has really great performance for
> unsuccessful lookups[...]
>
> [...]
>
> I understand that mixing 2 filters instead of modifying one to have
> mix of simd and scalar code
> is not really greatest comparison, but it was much easier for me to
> make combined filter, than
> to figure out how to mix scalar and simd filter code in one filter.

If you want to keep on playing, please find attached a sketch of a
bobw_multiblock64_avx2
filter for your enjoyment. (Why the "bobw" preffix btw?)

Joaquin M Lopez Munoz