$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [asio] Bug: Handlers execute on the wrong strand (Gavin Lambert).
From: Niall Douglas (s_sourceforge_at_[hidden])
Date: 2013-10-25 18:09:27
On 25 Oct 2013 at 19:39, Gavin Lambert wrote:
> >> upside most of the guts are entirely lock-free (though not wait-free,
> >> since it's based on Boost.LockFree's queue).
lockfree::queue isn't actually hugely performant. Most lock free code 
isn't compared to most lock based implementations because you gain in 
worst case execution times by sacrificing average case execution 
times. The only major exception is lockfree::spsc_queue which is 
indeed very fast by any metric.
> > Asio at Windows uses IOCP (default settings, using asio::io_service for task
> > scheduling) and that is the (theoretical) reason of better thread scheduling
> > for the Asio-based thread pool. Sometimes it's really visible.
> 
> It's also full of mutexes though, which is why it didn't work out for 
> me.  (Note that I was using Boost 1.53 when testing Asio; maybe this has 
> changed in future versions, although I heard that 1.54 picked up a bug 
> in the IOCP reactor.)
I saw lost wakeups during parallel writes in ASIO 1.54, so I disabled 
those for AFIO. That appears to be fixed in 1.55, so AFIO now 
parallelises everything as it was designed to. This might mean ASIO 
in 1.55 is fixed.
> I'm not sure exactly which lock triggered the slow path (my logging was 
> only sufficient to show that it was one of the ones inside Asio, but not 
> which one).  But as the prior email said, given reuse of strand 
> implementations between supposedly independent strands, that seems like 
> a likely candidate.  (Though it didn't take long for the latency spikes 
> to manifest -- typically they'd start after a couple of minutes and then 
> recur roughly every 10-30 seconds.)
ASIO is, once you compile it with optimisation, really a thin wrapper 
doing a lot of mallocs and frees around Win IO completion ports. Any 
latency spikes are surely due to either IOCP or the memory allocator 
causing a critical section to exceed its spin count, and therefore go 
to kernel sleep?
> I haven't done a head-to-head benchmark on each (and it wouldn't 
> surprise me if Asio were faster than mine for many loads -- and it's 
> definitely more flexible than I made mine) but so far my one is doing at 
> least as well as Asio on production loads but without the latency spikes 
> from the locks.  Still very early days yet though.
If you're on Haswell, you might look into my memory transaction 
implementation in AFIO. It uses Intel TSX if available according to 
runtime detection, otherwise it falls back onto a policy composed 
spin lock (yes I know I did NIH with yet another Boost spinlock 
implementation, but hey mine is policy composed so you can vary spin 
counts etc!!!). It works on Intel's TSX simulator, but I would really 
love to know if it works on real TSX hardware.
Niall
-- Currently unemployed and looking for work. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/