From: Gavin Lambert (boost_at_[hidden])
Date: 2020-04-02 23:16:10


On 3/04/2020 08:41, Jan Hafer wrote:
> On 02.04.20 20:59, Mathias Gaunard wrote:
>> So you're saying that circular_buffer is slower on a given thread when
>> other threads are accessing their own circular_buffer in parallel?
>> That sounds unlikely to be circular buffer's fault.
>
> Yes and I dont know quite the reason for it.
> My Threads know their id to access a file-global data structure
> containing their queue/circular buffer. They start another after in a
> thread-safe way and exit on emptying the queue/circular buffer.

That sounds like you're allocating a single array of circular_buffers
and then accessing them from different threads.

That's basically the worst possible thing to do; as Mathias was saying,
that will end up sharing cache lines between different cores and your
performance will tank.

At minimum, you should embed the circular_buffer into another struct
that has sizeof() >= std::hardware_destructive_interference_size, and
make an array of that.

But better still, embed the circular_buffer into your processing classes
and don't have arrays of them at all.

(If you're pre-C++17 and don't have
std::hardware_destructive_interference_size, then using 64 works for
most modern platforms.)

Ideally, the circular_buffer implementation itself should also separate
all internal producer-thread members and consumer-thread members by
std::hardware_destructive_interference_size and try very hard to not
cross over. (Here the main thing that matters is write accesses.)

If you want to try using a more modern circular buffer that gets this
correct, have a look at Boost.Lockfree's spsc_queue.

https://www.boost.org/doc/libs/1_72_0/doc/html/boost/lockfree/spsc_queue.html

(While you're there, there's also an MPMC queue as well. Note that a
lockfree queue will tend to be slower than std::queue in uncontended
benchmarks, but the avoidance of locks can be superior in highly
contended or other specialised scenarios.)