$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [heap][formal review]
From: Tim Blechmann (tim_at_[hidden])
Date: 2011-06-13 10:18:22
hi thomas,
thanks for your formal review!
> -- One inconvenience for me was the use of max-heaps instead of min-heaps.
> Well, one could argue that a priority_queue should return the elements
> with the highest priority first (but as Wikipedia says: "some conventions
> consider lower priorities to be higher"), and heap-sort uses a max-heap.
> Since these are the two application where stl uses a heap, it's clear why
> stl implements a max-heap. But the applications where it is beneficial to
> call increase or decrease directly normally use a min-heap. Now I have to
> call increase even so the key value actually has decreased...
there are two conventions to consider: min-heaps (which are typically used in
textbooks) and max-heaps (which are used in the stl heap functions). i prefered
to be consistent with the stl.
likewise i am using the `top()' to extract the value with the highest priority,
not `find_min()' like it is often used in the literature ...
> -- I was
> left a bit with the feeling that I hadn't made the most efficient use of
> Boost.Heap.
the only way to find the best data structure is to test with a specific
workload. boost.heap does not provide a `one-size-fits-all' solution, but a
toolbox of different configurable data structures.
> For example, I quickly found out that calling update just with
> a handle (as I had implemented it first) was suboptimal for
> fibonacci_heap.
suboptimal in what way?
> - The performance numbers for Boost.Heap were neither bad nor outstanding.
> -- The numbers for b_heap are not convincing
> -- The numbers for fibonacci_heap seem to be the best.
> -- The pairing_heap has a smaller compare count than fibonacci_heap, but
> actually performs worse than fibonacci_heap in this test. Considering
> statements like "Experimental studies indicate that pairing heaps actually
> outperform Fibonacci heaps", I wonder whether the implementation of
> pairing_heap could be improved so that it actually outperforms
> fibonacci_heap.
for a performance comparison, one will have to consider
* expenses of comparison
* expenses of swapping data
* size of the managed data
* workload (mixture/order of API calls)
i suppose for every data structure one can come up with a synthetic benchmark,
so that this data structures outperforms every other implementation.
> > What is your evaluation of the design?
>
> I guess it's OK. What makes me a bit nervous is the statement "The library
> is not targeted in any specific domain nor directly related to any
> specific libraries". A good interface often is the result from an
> abstraction over different applications. The other way (coming from the
> data structure/algorithm) risks bloating the interface with unnecessary
> functionality. For example, somebody asked during the review for
> "iteration over the heap in priority order". Is this really something that
> many applications requiring a heap will benefit from? I heavily doubt
> this.
testing two heaps for equivalence will benefit from it. of course you can ask,
why do you want to test that ...
> > Do you think the library should be accepted as a Boost library?
>
> No, the library should not be accepted unless
> - The current review either magically goes back on track and at least 3
> other persons also submit a proper review, or another (mini?) review with
> proper participation is held later.
... i am a bit disappointed by the lack of reviews myself. given the fact that
this is the only one that actually contains a vote and the review stoped
yesterday i would appreciate other feedback
> - Either a convincing example where b_heap outperforms the
> other heaps is presented, or it's use should be discouraged in the
> documentation.
http://article.gmane.org/gmane.comp.lib.boost.devel/220245
> - The documentation says a bit more about the different
> heaps and their trade offs (amortized time <-> worst case), and generally
> tries to give recommendation to the user which heap will likely best fit
> his application, and which settings he should try (for example something
> like that arity<4> is a good compromise for a d_ary heap between cache
> locality and compare count).
a section of `best practices' may be helpful?
> I used the negative form of "Yes, the library should be accepted under the
> following conditions", because I'm unhappy with how the review went. The
> announcement was sent out too late, without stating the dates of the
> review, much of the initial discussion was about the code collaborator
> tool, but I haven't seen any "real" review, despite that fact that there
> seems to be interest in the library. There were some discussions on Code
> Collaborator, and the tool was actually not bad in pointing to concrete
> problems in the actual code. Tim quickly addressed these, but a good
> review would have some discussions and actual opinions and suggestions.
well, i would appreciate more reviews myself ... however if you are the only
person who includes a vote and this vote is no, boost.heap should be rejected.
no matter if the reason is `documentation', `code', `performance' or `review
process'.
> It
> is also unclear to me how well Boost.Heap performs with respect to
> boost/graph/detail/d_ary_heap.hpp. I could have tried to also include this
> heap in my heap test/benchmark, but as the review seemed to have silently
> died already, I wasn't motivated enough for doing this.
iirc bgl heap data structures are pretty specific (optimized for integers).
thanks for taking your time to review boost.heap.
tim