$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: [boost] [unordered] equal_range is O(bucket_count()), not O(size())
From: Brad Higgins (bhiggins_at_[hidden])
Date: 2011-04-07 11:52:59
TR1 specifies that unordered_multimap::equal_range(k) worst case complexity is O(size()). However, in practice, I find that the worst case performance of boost unordred_multimap's equal_range is O(bucket_count()). Is this known?
I believe the issue is with the second iterator returned by equal_range(). Specifically, if there are sparse elements in the hash table, the implementation may have to traverse many buckets to find the second iterator.
I have a test that demonstrates the issue. The test makes 2 runs over an unordered_multimap, each using the same number of elements. In the second run, it calls rehash() in order to grow the bucket count by a large amount. Notice that the first run is quite fast, while the second run takes more than 10 seconds. Here are the results from my test:
----
$ ./main
rehash(0)
insert 1000 elements
built hash, with 1543 buckets and 1000 elements. Now, get some ranges
time to obtain 1000 elements: 00:00:00.001366
rehash(5000000)
insert 1000 elements
built hash, with 6291469 buckets and 1000 elements. Now, get some ranges
time to obtain 1000 elements: 00:00:10.818002
----
Here is my code for the test:
----
#include <iostream>
#include <boost/unordered_map.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
typedef boost::unordered_multimap<size_t, size_t> hash_int_type;
typedef hash_int_type::iterator hash_int_itr;
typedef std::pair<size_t, size_t> hash_insert_type;
void
run_test(const size_t data_size,
const size_t rehash_size)
{
hash_int_type hash;
std::cout << "rehash(" << rehash_size << ")\n";
hash.rehash(rehash_size);
std::cout << "insert " << data_size << " elements\n";
for (size_t i=0; i<data_size; i++) {
hash_insert_type ins(i,i);
hash.insert(ins);
}
std::cout << "built hash, with "
<< hash.bucket_count()
<< " buckets and "
<< hash.size()
<< " elements. Now, get some ranges\n";
boost::posix_time::ptime start(boost::posix_time::microsec_clock::local_time());
for (size_t i=data_size-1; i!=0; i--) {
typedef std::pair<hash_int_itr, hash_int_itr> ret_pair;
ret_pair ret = hash.equal_range(i);
if (ret.first != ret.second) {
hash.quick_erase(ret.first);
}
}
boost::posix_time::ptime stop(boost::posix_time::microsec_clock::local_time());
boost::posix_time::time_duration diff_time(stop-start);
std::cout << "time to obtain " << data_size << " elements: " << diff_time << "\n";
}
int
main(int argc, char* argv[])
{
run_test(1000,0);
run_test(1000,5000000);
return 0;
}
-----
Thanks,
Brad