$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: [boost] Lockfree review
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-07-28 15:14:41
Dear All,
Here is a quick review of the proposed Boost.Lockfree library.
Utility
-------
For some proposed libraries their utility is beyond doubt because they 
are already being used in some significant application.  In contrast 
most GSoC projects are developed more in the "hope that they will be 
useful" without a motivating application.  So we have to ask ourselves, 
does this library meet a need?  To me, lock free algorithms and data 
structures have always seemed appealing at first sight but, apart from 
some trivial cases like counters, on close inspection they turn out to 
be too complicated to bother with.  (They also lack required features, 
i.e. waiting for non-empty; I don't think I've ever written a 
producer-consumer application that didn't need the consumer to block on empty.)
The author acknowledges that these structures will typically be slower 
than conventional implementations using mutexes, but claims that their 
benefit comes in terms of better worst-case latency.  Specifically, if 
a thread is pre-empted while it holds a lock, another thread will have 
to wait until the first thread is re-scheduled and can unlock before it 
can itself lock.  But this case is exactly the one that priority 
inversion is supposed to address: when the second thread tries to lock, 
the kernel immediately re-schedules the first thread.  Yes, because of 
the kernel involvement this will probably still be slower than the 
lock-free approach, but perhaps not by much.  It seems to me that the 
author should present benchmarks to demonstrate the actual worst-case 
latency difference between this code and a lock-based implementation.
Design
------
I am happy with the design of the interface, as far as it goes; there 
are a few things that could be added though:
- Debug modes that check the thread requirements.
- Other combinations of single/multiple producer/consumer.
Implementation
--------------
I have looked only at the implementation of the ringbuffer class 
template.  It lacks comments.  There are some oddities like this:
         size_t ret = read_index - write_index - 1;
         if (write_index >= read_index)
           ret += max_size;
         return ret;
Here, a negative value may be assigned to the unsigned size_t.  I think 
this is actually safe, but it looks odd.  I have not found any bugs.
It looks like every call to enqueue() and dequeue() will access both 
the read and write indexes.  Won't this cause poor performance due to 
cache issues?  I would have been tempted to avoid this by keeping local 
'available' counts: (PSEUDO_CODE)
private:
   atomic<size_t> read_index_;
   atomic<size_t> write_index_;
   size_t read_available_;
   size_t write_available_;
public:
   bool enqueue(...) {
     if (write_available_==0) {
       // In this case, we access read_index_ which is probably cached 
by the consumer thread,
       // which is expensive.  But this is rare.
       write_available_ = (read_index_ + max_size - write_index_ - 1) % max_size;
     }
     if (!write_available_) return false;  // full
     buffer[write_index_.load(memory_order_relaxed)] = data;
     write_index_.store((write_index_+1)%max_size);
     --write_available_;
   }
   // dequeue similarly...
Documentation
-------------
I'm not impressed by the documentation.  It lacks rationale and 
information about the implementation.  It has some formatting issues 
and typos (e.g. lose/loose).  Platform requirements (i.e. the 
Boost.Atomic and/or std::atomic requirements and supported 
architectures) should be explained.
The reference documentation involves more clicks than I would expect to 
reach the actual content.
Some functions are marked as "for debugging only", e.g. the empty() and 
reset() [which should probably be called clear()] methods.  It's not 
clear to me why these methods must be unsafe; for example, for the 
ringbuffer, I would expect the consumer thread to safely be able to 
call empty() and the producer to safely be able to call full() (which 
is not provided).  Rationale for these restrictions (or relaxations of 
them) would be useful.
- Did you try to use the library?  With what compiler?  Did you have 
any problems?
I considered exercising the library on my ARM systems, but I realised 
that it requires 64-bit atomics; currently, Boost.Atomic only provides 
32-bit atomics on ARM.  (This raises the whole issue of the 
Boost.Atomic dependency.  I'm presuming that Tim intends not to bundle 
a Boost.Atomic snapshot with the library if approved.  If that changes, 
we really need another review.)
- How much effort did you put into your evaluation? A glance? A quick 
reading? In-depth study?
A few hours reading the docs and code.
- Are you knowledgeable about the problem domain?
I know a certain amount about atomics and lock-based concurrent 
programming, but not much about lock-free data structures.
- Please always state in your review, whether you think the library 
should be accepted as a Boost library!
I think it should be accepted provided that:
* The author demonstrates some measurable benefit over lock-based data structures.
* The documentation is improved to describe the implementation and to 
provide rationale for design decisions.
* The library is not added to Boost until Boost.Atomic has been 
reviewed and approved (or, available std::atomic implementations work 
with it).
Regards,  Phil.