$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [compute] kernels as strings impairs readability and maintainability
From: Mathias Gaunard (mathias.gaunard_at_[hidden])
Date: 2014-12-23 20:46:00
On 23/12/2014 20:21, Kyle Lutz wrote:
> While yes, it does make developing Boost.Compute itself a bit more
> complex, it also gives us much greater flexibility.
>
> For instance, we can dynamically build programs at run-time by
> combining algorithmic skeletons (such as reduce or scan) with custom
> user-defined reduction functions and produce optimized kernels for the
> actual platform that executes the code (which in fact can be
> dramatically different hardware than where Boost.Compute itself was
> compiled). It also allows us to automatically tune algorithm
> parameters for the actual hardware present at run-time (and also
> allows us to execute currently algorithms as efficiently as possible
> on future hardware platforms by re-tuning and scaling up parameters,
> all without any recompilation). It also allows us to generate fully
> specialized kernels at run-time based on
> dynamic-input/user-configuration (imagine user-created filter
> pipelines in Photoshop or custom database queries in PGSQL).
>
> I think this added complexity is well worth the cost and this fits
> naturally with OpenCL's JIT-like programming model.
I could see that from the code, yes.
But nothing should prevent doing that while still writing the original
OpenCL source code (or skeletons/templates) in separate files rather
than C strings.
>> Has separate compilation been considered?
>> Put the OpenCL code into .cl files, and let the build system do whatever is
>> needed to transform them into a form that can be executed.
>
> Compiling programs to binaries and then later loading them from disk
> is supported by Boost.Compute (and is in fact used to implement the
> offline kernel caching infrastructure). However, for the reasons I
> mentioned before, this mode is not used exclusively in Boost.Compute
> and the algorithms are mainly implemented in terms of the run-time
> program creation and compilation model.
I didn't necessarily mean compiling OpenCL to SPIR (if that's indeed
what you mean by binary).
You could just make the build system automatically generate the C string
from a .cl file, for example.
> Another concern is that Boost.Compute is a header-only library and
> doesn't control the build system or how it the library will be loaded.
> This limits our ability to pre-compile certain programs and "install"
> them for later use by the library.
As it is, you're probably getting some bloat for the sole reason that
you're getting a copy of all your strings in every TU, in particular the
radix sort kernel.
It makes more sense for it to be a library IMHO.
There is a tendency for people to prefer header-only designs because it
facilitates deployment due to not having to build a library with
compatible settings separately, but I do not think someone should go for
header-only just for that reason.
> That said, I am very interested in exploring methods for integrating
> OpenCL source files built by the build tool-chain and make loading and
> executing them seamless with the rest of Boost.Compute. One approach I
> have for this is an "extern_function<>" class which works like
> "boost::compute::function<>", but instead of being specified with a
> string at run-time, its object code is loaded from a pre-compiled
> OpenCL binary on disk. I've also been exploring a clang-plugin-based
> approach to simplify embedding OpenCL code in C++ and using it
> together with the Boost.Compute algorithms.
I do not know what you have in mind with your clang development, but I
assumed your library was sticking to oldish standard OpenCL for
compatibility with a wide variety of devices and older toolchains.
There are already some compiler projects that can generate hybrid CPU
and GPU code from a single source, turning functions into GPU kernels as
needed: C++AMP does it, CUDA does it too somewhat, and now there is
SYCL, a recent addition to the OpenCL standards that was presented at
SC14, which should become the best solution for this.