$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Joaquin M López Muñoz (joaquinlopezmunoz_at_[hidden])
Date: 2025-06-09 08:31:28
El 08/06/2025 a las 17:13, Ivan Matek escribió:
>
>
> On Sat, Jun 7, 2025 at 8:05â¯PM Joaquin M López Muñoz via Boost 
> <boost_at_[hidden]> wrote:
>
>
>     Anyway, why don't you run it locally and play with the #pragmas?
>
>
> Because when I quickly go to benchmark something 9 hours later I am 
> just quickly benchmarking something :)
> Also assuring reproducibility is pain, e.g. I do not have unused 
> machine on which I can SSH into, to avoid my browser use or random 
> background process messing with benchmark, especially considering 
> bloom uses L3 cache a lot.
Hey, thanks so much for running the benchmarks! Yes, variance hurts
analysis. I'm plannning to move my GHA-based benchmarks to dedicated
machines so that results are more stable.
>     Besides, I'm interested in results outside my local machine and GHA.
>     You just have to compile this in release mode (note the repo branch):
>
>     https://github.com/joaquintides/bloom/blob/feature/alternative-hash-production/benchmark/comparison_table.cpp
>
>
>  Well it was more complicated since I already have modular boost on my 
> machine so I had to do some hacks to get CMakeLists.txt to work and 
> also benchmark did not have CMakeLists.txt, and also I did use 
> march=native, mtune=native instead of what your scripts do...
>
> But to quickly recap:
>
>  1. There seems to be no unrolling happening without me doing it with
>     pragmas.
>  2. I have increased constants to reduce chance of noise affecting
>     results:
>     -  static const int              num_trials=10;
>     -  static const milliseconds     min_time_per_trial(10);
>     +  static const int              num_trials=20;
>     +  static const milliseconds     min_time_per_trial(50);
>  3. I did this to make tables more aligned:
>     -    "<table>\n"
>     +    "<table style=\"font-family: monospace\">\n"
>  4. In terms of benchmark setup I would add 5% of "opposite"
>     lookups(e.g. success in failures) since I presume current setup
>     does not penalize branchy code as realistic scenarios
>     would(although it is possible real code might also might have
>     close to 100% of successes or failures). Just to be clear: I did
>     not make this change.
>  5. I would suggest to to consider switching benchmark repo to use
>     native instead of mavx2
>
So, unrolling does not happen, this is out of the way, thanks for 
investigating.
I'll use -native as you suggest. As for the difference between the original
hash production scheme and the one proposed by Kostas (cells marked
with *), numbers are not very conclusive, but looks like Kostas's approach
incurs a slight degradation in execution time. I hope we can see this more
clearly with the upcoming GHA benchmarks on dedicated machines.
Joaquin M Lopez Munoz