Subject: Re: [boost] [Heaps] benchmarks
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-06-08 03:38:54


hi phil,

> http://chezphil.org/tmp/bench.tgz
>
> This is a "work queue", i.e. it's a queue of
> boost::function<void(void)> "work items" with float priorities. It's
> similar to a real application with fast (burst) fill and gradual drain;
> the pattern here is 10 pushes and 1 pop, repeated until all the work
> has been added, and then all of the work is popped in sequence. The
> priorities are essentially random, and there are 100 000 items in total.

thanks for your benchmark ... the pattern is different from the ones that i have
tested.

> I was unable to try the proposed d_ary_heap because I couldn't find a
> working syntax for the arity template parameter - any ideas anyone?

:) the scope operator uses two colons:

boost::heap::d_ary_heap< value_t,
                         boost:heap::arity<4>, // This doesn't work
                         boost::heap::compare<compare_priorities> >

boost::heap::d_ary_heap< value_t,
                         boost::heap::arity<4>,
                         boost::heap::compare<compare_priorities> >

> I was expecting the b_heap to be faster, but it is not; it is
> substantially slower. What is going on there? Does it all come down
> to size of data, cache size, etc? Ideally, an "improved"
> implementation like this would be able to fall back to the naive
> implementation in cases where it does not function any better. Or
> maybe I'm using it wrongly.

well, benchmarking this can be tricky ... first, your work is boost::funtion<>
(iirc 3 pointers) and a float ... so 16 bytes on 32bit or 28 bytes on 64bit ...
on 32bit systems you will have a better matching with actual physical cache
lines ...
second ... synthetical benchmarks are not necessarily the best way to test
cache-aware data structures: when stressing the heap, all its content will be
located in the cpu caches ... so you won't necessarily see the benefit of the
cache-awareness, since the data of the non-cache-aware data structure are still
in the CPU cache. unless you manually add some work to thrash the caches ... in
your case, you will have a maximum of 90000 items in the queue, on 32bit
platforms this is rougly 1.4mb ... you won't see a big effect, if this fits all
into the caches ...

not sure, if there is a good way to artificially thrash the caches in order to
observe the effect in a microbenchmark ...

> My understanding is that the fibonacci, binomial and other heaps only
> improve things when other operations e.g. merging are needed - is that
> correct?

more or less ... unlike the container adaptors, the fibonacci heap is the only
implementation, which provides an O(1) insert and increase ...

cheers, tim