$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [unordered] unordered_set::erase() complexity bug?
From: Christopher Jefferson (chris_at_[hidden])
Date: 2009-11-23 18:14:55
On 23 Nov 2009, at 23:23, Stefan Strasser wrote:
>
> I think there is something very wrong with boost's implementation of
> unordered_set/map.
>
> 1 unordered_set<int> s;
> 2 for(int c=0;c<nr;++c) s.insert(c);
> 3 for(int c=nr-1;c>=0;--c) s.erase(s.find(c));
>
> results:
>
> nr: time
> 20000: 0m0.244s
> 40000: 0m0.940s
> 60000: 0m5.584s
> 80000: 0m4.232s
> 100000: 0m34.814s
> 120000: 0m33.486s
>
>
> the function that's causing this is erase(), which is specified to be O(1)
> average, O(n) worst.
> I can't think of a reason why it should be O(n) in this case (perfect hash,
> erase doesn't rehash).
> but even if it was, the results should be at worst quadratic.
I suspect the problem you are seeing is that erase returns an iterator to the next member of the unordered_set, which is a possibly O(n) search if the set is sparsely populated, and in particular if, as you get here, you erase the elements in a particularly bad order. I'm not sure why this doesn't occur with the intrusive containers.
One way to significantly reduce this problem is to use a different hash function, which shuffles the values more.
I don't know if there is an algorithmic improvement which can fix this. there certainly isn't an obvious one.
Chris