$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Flow-based programming library for Boost?
From: Julian Gonggrijp (j.gonggrijp_at_[hidden])
Date: 2012-12-15 21:31:15
Dave, I'm a bit confused after reading your post. In part I think 
we're not actually disagreeing about the relative merits of static and 
dynamic wiring and the possibility to combine them. In part I'm 
puzzled by what seems to me to be your bolder claim: the possibility 
of statically wired asynchronous concurrent message passing. In part I 
wonder whether my implicit assumptions about inter-thread message 
passing might have been too implicit. I'll try to touch upon each of 
these issues below, as coherently as I can manage.
Dave Abrahams wrote:
> on Wed Dec 12 2012, Julian Gonggrijp wrote:
> 
>> [...] If computations can take place /at the same time/ it does 
>> not really matter whether connections are made at compile-time or run-
>> time; /waiting/ is eliminated altogether so the building of 
>> connections at run-time adds only constant overhead.
> 
> It's not just the cost of *building* of dynamic connections but the cost
> of traversing them.
I was assuming dynamic connections of the following type: a fixed-size 
continuous (circular) buffer in memory, a producer thread that is 
continuously appending data to the end of the buffer, and a consumer 
thread that is continuously removing data from the front of the 
buffer. This can be done lock-free and wait-free, but you're right 
that there's still some cost involved because of pointer arithmetic 
and the occasional cache miss.
Still I don't see how you could avoid this cost without giving up on 
the asynchronous behaviour altogether. I'll return to that question 
below.
>  Of course if your sub-computations happen to have
> large granularity, these costs may be swamped by others,
Here I believe we're not disagreeing at all. You seem to be implying 
that for very tightly interdependent computations static, single-
threaded connections are most efficient, while for more coarse-grained 
computations concurrency is the best solution. I already wanted to 
make this point in my previous post.
> but a truly
> flexible system would allow you to tune the approach to the problem.
> 
> The accumulators library was first used in a massively-parallel context
> with each process running thousands or millions of accumulator sets.
> Obviously it really depends on your problem domain but that might be a
> more efficient way to break down the parallel processing.  Of course a
> truly flexible system would (as Jeremiah suggested his design does)
> allow you to combine static and dynamic wiring (sort of like Xpressive
> does).
> 
>> That constant overhead is worth it if accepting it helps you to avoid 
>> other problems. Note that the "wires" built by Accumulators at 
>> compile-time are /logical/ connections between /types/ that represent 
>> operations. Message passing "wires" are /physical/ connections between 
>> /threads/. 
> 
> The accumulators library's wires are just as "physical" as any others,
> they're just encoded in the instruction stream instead of in mutable
> data.  I don't see the relevance of your emphasis on the word "types."
> 
>> Instantiating the latter kind of connection at compile-time seems very
>> hard, if not impossible.
> 
> No, you could easily use threads with a system like the accumulators
> framework.
Given the context of my assumptions and the text that you're replying 
to, in all of the above you seem to be suggesting that the 
asynchronous, buffered message passing that I described could also be 
implemented statically but still concurrently and asynchronously, in 
such a way that the mutable buffer is replaced by something in the 
instruction stream. I find it very hard to imagine how this could be 
done, because at compile time it's impossible to predict how the 
instruction streams of both threads will be aligned during run-time. 
If something like that could actually be done, *please* enlighten me. 
:-)
Alternatively I could take your remark "you could easily use threads 
with a system like the accumulators framework" in isolation, and then 
I'd interpret it completely differently: "you can build coarse-grained 
single-threaded computations from fine-grained ones using 
Accumulators, and then connect multiple of those coarse-grained 
computations asynchronously using dynamic buffers". As I said I agree 
with that, but given the context I'm unsure whether that is what you 
meant.
>> Actually I think the Accumulators framework and lock-free message 
>> passing are completely orthogonal. 
> 
> Exactly.  It seems as though you're now contradicting yourself.
I meant to say that the Accumulators framework and the (buffered 
asynchronous) lock-free message passing serve different needs and that 
these needs may occur independently, so there can be programs with 
both needs, hence calling for application of both techniques in 
parallel. As far as I'm concerned this is in line with everything I've 
said before.
If this is your point, I must conclude that all of our discussion has 
been one big misunderstanding, because it was my point as well.
>> One could use a "dataflow" library to connect threads at run-time
>> while composing computations for the individual threads at
>> compile-time with Boost.Accumulators, and get the best of both worlds.
>> 
>> I hope I'm making sense. Please let me know what you think.
> 
> I think you eventually made sense :-)
Assuming that our discussion was based on misunderstanding and that 
we're actually agreeing, one thing is still bothering me. You appear 
to believe that the same library should provide for both ways of 
composing computations, i.e. the static single-threaded one (like 
Accumulators) and the dynamic asynchronous one. I doubt that would be 
the right thing to do, because the modes of composition are not 
related in their underlying logic (contrary to the static and dynamic 
expressions in Xpressive) and because they pose very different 
requirements on the subcomputations that are to be composed.
In the dynamic multithreaded mode of composition the producer and the 
consumer can be as unrelated as any function that writes to a 
std::vector and any other function that reads from it. As the 
Accumulators mode of composition involves much tighter coupling 
between the operations, they need to have much more in common, e.g. 
features and dependencies in Accumulators.
-Julian