From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-06-09 08:31:28


El 08/06/2025 a las 17:13, Ivan Matek escribió:
>
>
> On Sat, Jun 7, 2025 at 8:05 PM Joaquin M López Muñoz via Boost
> <boost_at_[hidden]> wrote:
>
>
> Anyway, why don't you run it locally and play with the #pragmas?
>
>
> Because when I quickly go to benchmark something 9 hours later I am
> just quickly benchmarking something :)
> Also assuring reproducibility is pain, e.g. I do not have unused
> machine on which I can SSH into, to avoid my browser use or random
> background process messing with benchmark, especially considering
> bloom uses L3 cache a lot.

Hey, thanks so much for running the benchmarks! Yes, variance hurts
analysis. I'm plannning to move my GHA-based benchmarks to dedicated
machines so that results are more stable.

> Besides, I'm interested in results outside my local machine and GHA.
> You just have to compile this in release mode (note the repo branch):
>
> https://github.com/joaquintides/bloom/blob/feature/alternative-hash-production/benchmark/comparison_table.cpp
>
>
>  Well it was more complicated since I already have modular boost on my
> machine so I had to do some hacks to get CMakeLists.txt to work and
> also benchmark did not have CMakeLists.txt, and also I did use
> march=native, mtune=native instead of what your scripts do...
>
> But to quickly recap:
>
> 1. There seems to be no unrolling happening without me doing it with
> pragmas.
> 2. I have increased constants to reduce chance of noise affecting
> results:
> -  static const int              num_trials=10;
> -  static const milliseconds     min_time_per_trial(10);
> +  static const int              num_trials=20;
> +  static const milliseconds     min_time_per_trial(50);
> 3. I did this to make tables more aligned:
> -    "<table>\n"
> +    "<table style=\"font-family: monospace\">\n"
> 4. In terms of benchmark setup I would add 5% of "opposite"
> lookups(e.g. success in failures) since I presume current setup
> does not penalize branchy code as realistic scenarios
> would(although it is possible real code might also might have
> close to 100% of successes or failures). Just to be clear: I did
> not make this change.
> 5. I would suggest to to consider switching benchmark repo to use
> native instead of mavx2
>

So, unrolling does not happen, this is out of the way, thanks for
investigating.
I'll use -native as you suggest. As for the difference between the original
hash production scheme and the one proposed by Kostas (cells marked
with *), numbers are not very conclusive, but looks like Kostas's approach
incurs a slight degradation in execution time. I hope we can see this more
clearly with the upcoming GHA benchmarks on dedicated machines.

Joaquin M Lopez Munoz