$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Synchronization (RE: [compute] review)
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2014-12-30 09:08:21
Le 30/12/14 14:48, Gruenke,Matt a écrit :
> -----Original Message-----
> From: Boost [mailto:boost-bounces_at_[hidden]] On Behalf Of Thomas M
> Sent: Tuesday, December 30, 2014 7:37
> To: boost_at_[hidden]
> Subject: Re: [boost] Synchronization (RE: [compute] review)
>
>> If you are going to implement such RAII guards here's a short wish-list of features / guard classes:
>>
>> a) make guards "transferable" across functions
> I agree they should be movable, but it makes no sense for them to be copyable.
>
>
>> b) a container of guards and/or a guard for a wait_list as whole
> Hmmm... I can see the benefits (convenience). I'd make it a different type, though.
>
> I assume it should hold a reference to the list? Since the guarantee is designed to block when the wait_list goes out of scope, I think it's reasonable to assume its scope is a superset of the guarantee's.
>
>
>> c) a guard for a command-queue as whole
>> [possibly guards for other classes as well]
> Why? Convenience?
>
> Unless you're using it as a shorthand for waiting on individual events or wait_lists, there's no need. The event_queue is internally refcounted. When the refcount goes to zero, the destructor will block on all outstanding commands.
>
>
>> a) + b) because something like this is really useful:
> Um... how about this:
>
> void foo()
> {
> // setup all memory objects etc.
>
> wait_list wl;
> wait_list::guarantee wlg(wl);
>
> // send data to device
> wl.insert(cq.enqueue_write_buffer_async(devmem, 0, size, host_ptr));
> wl.insert(cq.enqueue_write_buffer_async(devmem2, 0, size, host_ptr2));
>
> // a kernel that reads devmem and devmem2 and writes to devmem
> wl.insert(cq.enqueue_task(kern, wl)); // Note: wl is copied by enqueue funcs
>
> // copy result back to host
> wl.insert(cq.enqueue_read_buffer_async(devmem, 0, size, host_ptr, wl));
>
> // wl.wait() would only be necessary if you wanted to access the results, here.
>
>
> // Enqueue an independent set of operations with another wait_list
> wait_list wl_b;
> wait_list::guarantee wlg_b(wl);
>
> // send data to device
> wl_b.insert(cq.enqueue_write_buffer_async(devmem_b, 0, size_b, host_ptr_b));
>
> // ...
> }
>
>
Maybe you can follow the task_region design (See
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4088.pdf).
>> With c) I have something like this in mind:
> What about this?
>
> {
> command_queue cq(cntx, dev);
> command_queue::guarantee cqg(cq);
> cq.enqueue_write_buffer_async(devmem, 0, size, host_ptr)
> transform(..., cq); // implicitly async cq.enqueue_read_buffer_async(...);
>
> // here automatic synchronization occurs
> }
>
>
> It does presume that command_queues are local and tied to related batches of computations. Those assumptions won't always hold.
The same here.
Best,
Vicente