$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Proposal: MapReduce library (single machine)
From: Joel Falcou (joel.falcou_at_[hidden])
Date: 2009-06-16 02:19:05
Craig Henderson wrote:
> I've already answered this in other threads... the scheduling is implemented
> in a policy class so other threading approaches can be used. Current
> implementations are Sequential (single thread Map followed by Reduce
> phases), and CPU Parallel to maximize CPU core utilization.
>
I saw that, the question was, for your parallel scheduler, how do you
generate
worklaod for each processor ?
> I'm running some tests and will update the site with performance comparisons
> shortly
>
Great
> The idea of MapReduce is to map (k1,v1) --> list(k2,v2) and then reduce
> (k2,list(v2)) --> list(v2). This inevitably requires iteration over
> collections. A generic Map & Reduce task could be written to delegate to
> sequential functions as you suggest, but I see this as an extension to the
> library rather than a core component.
Well, canonically, running a map function only require the
(k1,v1)->(k2,v2) funcion.
The sequence iteration is leveraged by the map skeleton. Similary for
Reduce where
a fold like function is strictly needed. Having to specify how to
iterate over the sequence
is uneeded IMHO and add clutter to what you need to write. I don't see
an actual improvement on this
point if I still have to iterate myself on my data and just use yopur
tool to generate the scheduling.
I can do it by hand with a thread_pool and it won't be more verbose.
An "optimal" way to have this should be :
map_reduce<SomeSchedulingPolicy>( input_seq, output_seq, map_func,
reduce_func)
and having xxx_seq be conforming to some IterableSequence concept and
have xxx_func be functions object or PFO conforming
to the standard map/fold prototype. Instrospection on ypes and presence
of given methods/functions then helps finding how to
iterate over the sequence (using type_traits and suc) and generate the
appropriate, optimized iteration code calling map and fold where it should.