From: Ivan Matek (libbooze_at_[hidden])
Date: 2025-06-08 15:13:22


On Sat, Jun 7, 2025 at 8:05 PM Joaquin M López Muñoz via Boost <
boost_at_[hidden]> wrote:

>
> Anyway, why don't you run it locally and play with the #pragmas?
>

Because when I quickly go to benchmark something 9 hours later I am just
quickly benchmarking something :)
Also assuring reproducibility is pain, e.g. I do not have unused machine on
which I can SSH into, to avoid my browser use or random background process
messing with benchmark, especially considering bloom uses L3 cache a lot.

> Besides, I'm interested in results outside my local machine and GHA.
> You just have to compile this in release mode (note the repo branch):
>
>
> https://github.com/joaquintides/bloom/blob/feature/alternative-hash-production/benchmark/comparison_table.cpp
>

 Well it was more complicated since I already have modular boost on my
machine so I had to do some hacks to get CMakeLists.txt to work and also
benchmark did not have CMakeLists.txt, and also I did use march=native,
mtune=native instead of what your scripts do...

But to quickly recap:

   1. There seems to be no unrolling happening without me doing it with
   pragmas.
   2. I have increased constants to reduce chance of noise affecting
   results:
   - static const int num_trials=10;
   - static const milliseconds min_time_per_trial(10);
   + static const int num_trials=20;
   + static const milliseconds min_time_per_trial(50);
   3. I did this to make tables more aligned:
   - "<table>\n"
   + "<table style=\"font-family: monospace\">\n"
   4. In terms of benchmark setup I would add 5% of "opposite" lookups(e.g.
   success in failures) since I presume current setup does not penalize
   branchy code as realistic scenarios would(although it is possible real code
   might also might have close to 100% of successes or failures). Just to be
   clear: I did not make this change.
   5. I would suggest to to consider switching benchmark repo to use native
   instead of mavx2

my tests were of form:
taskset --cpu-list 0 {binary} {number} >> {description}.html

cpu was i7-13700H, core speed was not locked, range between 3.2 and 3.8GHz,
it is possible avx code was affecting cpu speed, but did not check, could
be just accumulated heat.

flags:
FLAGS = -O3 -DNDEBUG -fcolor-diagnostics -march=native -mtune=native

I have attached 2 runs so you can see the noise of measurement on my
machine.
I have also attached one unrolled run, just to see it can cause difference,
but as I said this does not matter much since by default clang does not
unroll.