$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
Subject: [Boost-users] boost multithread C++ program design for low-latency large-data exchange
From: Jack Bryan (dtustudy68_at_[hidden])
Date: 2011-09-25 00:17:34
        Hi, 
I am trying to solve a network flow problem by C++ multithreading. 
Given a network (all nodes are connected by arcs, each arc is 
connected to 2 and only 2 ending nodes, one is input node and another is
 output node, each node can have multiple input arcs and output arcs), 
each node needs to do some computing and then exchange 
computing result data to its connected input and output nodes. 
Multiple nodes can be grouped into one task, which is run by one thread. In this way, the 
whole network computing workload can be partitioned into multiple tasks. All these tasks 
are pushed into a boost thread pool such that all threads can run the tasks at the same 
time. 
But, if a node (in a thread task) needs to do data exchange with another node (in another 
thread task), there is a synchronization problem. Data receiver needs to wait for data 
available in the data buffer of the data sender. 
My proram needs to partition the network such that each thread's task workload is assigned 
as evenly as possible.
If all threads share the one-large data buffer structure, the program parallelism is not 
good because the critical section is too large. Some threads have to wait the the 
one-large data buffer structure unlocked even though the part of the data structure (which 
is useful to them ) has been available for read or write. 
For example, the one-large data buffer structure has the following buffer cells: 
cell1 , cell2,  cell3 , cell4. 
When thread 1 is trying to write cell 1, it must lock the whole  data buffer structure so 
that thread 2 cannot read or write cell 2 and so on. 
So, I want to break the one-large data buffer structure into multiple distinct data cells 
according to the thread number so that each cell holds the data only needed by one thread 
task. 
For example, if we have 2 threads, we create 2 data cells that hold data needed by the 4 
thread separately. 
If we have 4 threads, we create 4 data cells that hold data needed by the 4 thread separately. 
and so on. 
My questions are: 
(1) How to design the data cell ? You can see that its size is based on the number of threads. 
(2) How to reduce synchronization overhead ? The critical section is small but the 
overhead of geting and releasing mutex may be very high if the inter-node data exchange frequency is high. 
(3) When a node's computing is done and data is written to its cell how to notify the data 
receiver node such that the notification messgae is only received by the waiting thread 
that run the receiver node computing task. All other unrelated nodes and threads are not 
impacted. 
The program is very time-sensitive and the latency of message exchange should be 
controlled very toughly and reduced as much as possible.
 
Any help is really appreciated. 
Thanks