$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Flow-based programming library for Boost?
From: Dave Abrahams (dave_at_[hidden])
Date: 2012-12-15 17:32:40
on Wed Dec 12 2012, Julian Gonggrijp <j.gonggrijp-AT-gmail.com> wrote:
> Dave Abrahams wrote:
>
>> If you want such a library to be truly efficient, you should be able to
>> wire together the dataflows at compile-time and not pay any overhead for
>> composing large computations out of smaller ones.  The accumulators
>> library uses something like a dataflow model, with the "wiring" done
>> completely at compile-time.
>
> Because of your remark I studied the Accumulators library, and I must 
> say it's very elegant indeed. Dependencies between computations are 
> resolved at compile-time (wow!) so they don't add unnecessary overhead 
> at run-time and everything will be executed in the most efficient 
> order.
>
> However, note the word "order". Boost.Accumulators was clearly 
> designed under the assumption of single-threaded operation and tightly 
> interdependent computations. I believe it might very well be the best 
> possible approach under those assumptions, but when the assumptions 
> don't apply other considerations may call for a different solution.
>
> "Dataflow" is a very general term, but the people who advertise with 
> that word typically mean something more specific, i.e. lock-free 
> multithreading with message passing. The underlying assumptions are 
> more or less opposite to the assumptions in Boost.Accumulators: 
> computations can run concurrently without waiting for each other, as 
> long as they have input available and they can dispose of their output 
> somewhere. If computations can take place /at the same time/ it does 
> not really matter whether connections are made at compile-time or run-
> time; /waiting/ is eliminated altogether so the building of 
> connections at run-time adds only constant overhead.
It's not just the cost of *building* of dynamic connections but the cost
of traversing them.  Of course if your sub-computations happen to have
large granularity, these costs may be swamped by others, but a truly
flexible system would allow you to tune the approach to the problem.
The accumulators library was first used in a massively-parallel context
with each process running thousands or millions of accumulator sets.
Obviously it really depends on your problem domain but that might be a
more efficient way to break down the parallel processing.  Of course a
truly flexible system would (as Jeremiah suggested his design does)
allow you to combine static and dynamic wiring (sort of like Xpressive
does).
> That constant overhead is worth it if accepting it helps you to avoid 
> other problems. Note that the "wires" built by Accumulators at 
> compile-time are /logical/ connections between /types/ that represent 
> operations. Message passing "wires" are /physical/ connections between 
> /threads/. 
The accumulators library's wires are just as "physical" as any others,
they're just encoded in the instruction stream instead of in mutable
data.  I don't see the relevance of your emphasis on the word "types."
> Instantiating the latter kind of connection at compile-time seems very
> hard, if not impossible.
No, you could easily use threads with a system like the accumulators
framework.
> Actually I think the Accumulators framework and lock-free message 
> passing are completely orthogonal. 
Exactly.  It seems as though you're now contradicting yourself.
> One could use a "dataflow" library to connect threads at run-time
> while composing computations for the individual threads at
> compile-time with Boost.Accumulators, and get the best of both worlds.
>
> I hope I'm making sense. Please let me know what you think.
I think you eventually made sense :-)
-- Dave Abrahams BoostPro Computing Software Development Training http://www.boostpro.com Clang/LLVM/EDG Compilers C++ Boost