Subject: Re: [boost] [random] new threefry random engine
From: Steven Watanabe (watanabesj_at_[hidden])
Date: 2014-04-22 12:25:03


AMDG

On 04/22/2014 08:51 AM, John Salmon wrote:
> <snip>
> With sequential
> generators, a sufficiently motivated and careful user can use the
> discard() function to initialize multiple generators in parallel.
> For most generators, discard(N) is O(N), defeating its use for this
> purpose, but for a few (including Threefry), discard(N) is O(1) making
> this a practical, but hardly elegant or natural approach to
> parallelism.
>

FWIW, discard isn't inherently O(N) for most generators.
Faster algorithms exist, but are not widely implemented.

> <snip>
> A better approach is to devise a trivial, application-specific mapping
> of time (iteration number) plus some stable aspect your logical
> elements (e.g., atom id) into unique 128-bit inputs. You can now
> obtain the random values you need, wherever and whenever you need them
> simply by generating a unique 128-bit input and calling the
> RandomFunction. There is no space overhead. If your parallel
> algorithm dictates that you need the same random value, e.g., to
> perform an "update" in two different threads or nodes, you just call
> the RandomFunction with the appropriate arguments wherever you need it
> and you get the results you need, where you need them. In parallel
> applications this is much more natural than trying to divvy up a
> single stream into multiple substreams by 'discarding' or 'jumping' an
> inherently sequential state. It is cheaper and easier than
> associating a stateful generator with each of your logical objects and
> worrying about the storage and bandwidth to keep them in-sync across
> the threads/nodes of a parallel computation.
>

The problem I have with this, basically comes down
to encapsulation.

- distributions consume an unspecified number
  of random values. I have no idea how to make
  the distributions work with a RandomFunction,
  besides requiring them to use a 1-to-1 mapping.
  Even with 64-bit values, this limits the precision
  of the random variates.
- Taking this one step farther, how can I write
  a generic function which uses an arbitrary
  RandomFunction to generate a sequence of k
  random variates? There's no way to know a
  priori how many unique inputs I would need.
  Even if k is constant, it still depends on
  the block size of the RandomFunction.
- A RandomFunction is strictly a PRNG interface.
  It doesn't work very well for expressing other
  sources of random values, such as /dev/urandom,
  specialized devices, and www.random.org.

I think that we can get most of the benefits of a
RandomFunction without going away from the Engine
concept, if we add a couple of extra guarantees
for engines intended for parallel use:
- seeding a new engine is fast
- distinct seeds (within some limits)
  produce uncorrelated sequences.
Both of these properties are trivial for Engines based
on RandomFunctions.

In Christ,
Steven Watanabe