$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Topher Cooper (topher_at_[hidden])
Date: 2004-07-12 10:53:59
On Monday, July 12, 2004, at 03:37 AM, Anders Edin wrote:
>
>> One of the important characteristics of a pseudo random number
>> generator is, oddly enough, determinancy.  Given the same seed, you 
>> get
>> the same sequence and can therefore reproduce the exact same results.
>> This allows results to be checked and bugs to be tracked down 
>> reliably.
>
> Well, this is of course fine for debugging. However, when running a 
> simulation
> shouldn't you rather think of statistics than of finding bugs?
>
First off, are you proposing that you carefully use one PRNG that does 
not have hidden, global state to do your debugging and validation, then 
rip it out of your code to replace it with a different PRNG with a 
different interface to produce your actual runs with un-debugged, 
un-validated code?  I don't think you thought that one through.
Secondly, in the three areas I mentioned later -- statistical, 
scientific and simulation applications -- you should never consider 
that you have stopped debugging.  You make what appears to be a 
successful run and record your results.  Sometime later you do a 
different one that does not seem fully consistent with the earlier run. 
  What do you do?  Pretend that there is no problem?  Pick one of the 
runs at random and throw it out?  Include a footnote saying that there 
may be something wrong with your results, but you have no idea what?  
Or take the seeds you recorded from each run, recreate the runs and try 
to resolve the issue?
This folks is the simulation equivalent to cleaning your test-tubes in 
a chemistry lab -- a fundamental, elementary lab procedure that is 
necessary for reliable results.
>> If another thread, or just a library used by your application (for
>> example, something that uses a pseudo-random numbers for dithering a
>> graphic display), was using the same singleton engine as your
>> application this characteristic would be lost.  The number of values
>> drawn by the other parts of your system might change or reseeding of
>> the engine might happen without your knowledge.
>
> If you use one generator of the same type in each thread, but within 
> the same
> simulation application, how do you know that the different threads are 
> not
> correlated in the statistics sense? Do you set the seeds far apart? If 
> one is
> not careful the random distribution produced by the application is not 
> the
> one you thought it would be.
>
As von Neuman's famous quote has it, when you are using pseudo-random 
numbers you are "in a state of sin."  You are taking a sequence of 
numbers that are anything but random, and simply pretending that they 
are, in fact, perfectly random.  You get away with it by carefully 
considering the characteristics of that non-randomness and, through 
careful analysis and testing you make sure that the non-randomness 
doesn't matter to what you are doing with it.  If your application uses 
random numbers in multiple places you must be sure that there are no 
meaningful correlations between the streams in those different places 
-- either by using a single pseudo-random number stream for all of them 
or by using a set of PRNGs that are independent in statistical tests.
It is a fact of life in modern programming systems we use many packages 
and libraries the precise content of we have no control over and whose 
contents we are frequently ignorant of.  The requirement that this puts 
on us in using pseudo-random generators is that any correlation (*or 
side effect*) that might be caused by use of such libraries of 
pseudo-random generators (whether or not we *know* that they make use 
of them) should not have any affect on our results.  The example I gave 
-- a PRNG used for controlling shading in a graphics package for 
displaying the results -- is likely to have that characteristic.  Other 
possibilities involve interthread, interprocess or interprocessor 
communication protocols, "random" keys assigned to data-structure nodes 
for hashing and various algorithms that introduce randomness to make 
worst-case performance situations unlikely.  Correlations of a 
simulation's PRNG with the PRNGs used in such packages are unlikely to 
invalidate your results (but you should always, of course, consider the 
possibility that they might -- one reason that you should always repeat 
runs with at least two different PRNGs in any serious simulation).
> As before the application I have in mind is physics simulations. If 
> you use
> random numbers for something else perhaps the needs are different.
On the contrary, that is a prime example of an area where this is an 
absolute requirement.  If you do actual physics experiments in a 
physics lab you keep a careful record of *every* aspect of the 
experiment that you can in order to be able to re-examine, re-analyze 
and replicate the experiment as closely as possible.  Why would the 
fact that your experiment is run in a "virtual lab" where you have the 
capability to easily record and replicate every aspect of the 
experiment so much more precisely lead you to discard one of the basic 
principles of scientific rigor?
Of course, you *could* save the entire stream of pseudo-random numbers 
used with every run instead, but it is so much more compact and 
convenient, don't you think, to just record the seed and make sure that 
all the packages and code necessary is kept in a well-maintained source 
control system.
Just to remind you of a rather famous example of this.  Lorenz was 
doing some simulations of weather systems.  He discovered, that when he 
attempted to rerun his simulations using the precise starting points he 
had recorded he got entirely different results.  The starting 
conditions had been recorded as decimal printouts introducing a tiny 
rounding difference -- a "half-bit" difference in the least significant 
figure, which nevertheless resulted in entirely different outcomes.  
His investigation of this lead to the (re)discovery of the "butterfly 
effect", and is generally considered the beginning of modern chaos 
theory.
>
> -- 
> Anders Edin, Sidec Technologies AB
> _______________________________________________
> Unsubscribe & other changes: 
> http://listarchives.boost.org/mailman/listinfo.cgi/boost
>