$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Topher Cooper (topher_at_[hidden])
Date: 2007-01-31 10:40:23
At 03:58 PM 1/30/2007, you wrote:
>For now it seems that you use the default variance implementation
>which should be the naive estimator from the sum and sum of squares.
Why would you supply a poor implementation as the default when the 
alternative (West's algorithm) is efficient, easily implemented and 
is much more precise?  The only reasons I can think for including the 
naive algorithm at all is for those rare cases where a small increase 
in performance is more important than a potentially very large loss 
in precision or if you are using exact arithmetic (e.g., if instead 
of floating point you are using rational numbers or if all your 
values are integers).
The "pathological cases" where the naive algorithm does poorly are 
when the sum of squares (and the square of the sum) is large relative 
to the variance, this is very frequently the case.  It can occur when 
the variance is small relative to the mean or when there are more 
than a few terms involved.  How small relative to the mean or how 
many terms depends on how much precision you really care about in 
your variance.  Because we are dealing with squares the error mounts 
pretty quickly.
For the record:
West's algorithm:
t1 = (x[k] - M[k-1])
t2 = t1/k;
M[k] = M[k-1] + t2;
T[k] = T[k-1] + (k-1)*t1*t2
Mean{X[1]...X[n]} = M[n]
Var{X[1]...X[n]}=T[n]/m
Where "m" = (n-1) for the unbiased estimator of the population variance
              or = n for the minimal-variance estimator of the 
population variance
Topher Cooper