$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [Boost-users] mapped_region locks on multithread
From: Brian Budge (brian.budge_at_[hidden])
Date: 2012-06-09 11:51:46
On Sat, Jun 9, 2012 at 1:34 AM, Mikhail Eremin <meremin_at_[hidden]> wrote:
> Hello,
> SETTING:
> - There is an application, written using Boost Template library, meant for
> QUICK processing of bulk text files (cca 50-100Gb each).
> - There is a huge, quick and expensive piece of hardware with HUGE amount of
> RAM and multiple CPU.
> - There is [theoretically] any possible UNIX-like OS, even Microsoft
> Windows(R) is considered.
> - Boost Thread Pool extension is used; previously memory mapped files
> through memory_segment have been used, now got rid of the entire
> Boost::interprocess.
> - There are NO explicit data items in the application's algorithm to be
> shared by threads, each has its own piece of input file, thus - there is NO
> explicit concurrency.
> PROBLEM:
> - Ensure fast processing without locks and threads sleeping.
> Currently the threads sleep on some internal mutex. We thought it's been
> boost::interprocess (specifically - mmap, wrapped by a mutex), but it
> apparently isn't so.
>
> SPECIFIC QUESTION:
> - How could we get rid of Boost locks?
>
> Mike
Okay, so you have enough memory to map an entire file into memory at
once? Are the files read-only? Where are you using a boost lock to
get rid of? Probably the threadpool library uses a lock on a queue
somewhere?
You could certainly write this without a threadpool. I'd imagine that
the cost of launching threads will be insignificant compared to
running the algorithm on these regions:
std::vector< std::pair<uint64_t, uint64_t> > regions;
boost::atomic<size_t> nextRegion;
struct RegionThread {
/*mmap info variables*/
RegionThread(/*mmap info*/) : /*mmapinfo member(mmap info) */{}
void operator() () {
while(true) {
size_t next = nextRegion.fetch_add(1);
if(next >= regions.size()) { break; }
std::pair<uint64_t, uint64_t> const ®ion = regions[next];
/*perform algorithm on region of mmapped file...*/
}
}
};
void operateOnFile(/*some mmap info*/) {
regions.clear();
// set up regions for this file
nextRegion
boost::thread_group tg;
for(size_t i = 0; i < boost::thread::hardware_concurrency(); ++i) {
tg.create_thread(RegionThread(/*mmap info*/));
}
tg.join_all();
}
If you can statically schedule the work into sets of regions that each
thread will work on, this is even easier, and can be done without even
an atomic variable.
Brian