Subject: Re: [boost] [histogram] Variance
From: Hans Dembinski (hans.dembinski_at_[hidden])
Date: 2018-09-18 07:41:25


Dear Bjørn,

> On 17. Sep 2018, at 22:08, Bjorn Reese via Boost <boost_at_[hidden]> wrote:
>
> The variance of individual bins can be obtained when using the
> adaptive_storage (via h.at(i).variance().)
>
> I am trying to understand the overhead of this feature.
>
> If I interpret the code correctly, there is a space overhead because each counter has to keep track of both the count and the sum of squares.
> The computational overhead is that the sum of squares has to be
> calculated for each insertion. Is this correct?
>
> If so, is there any way to use the adaptive storage policy without
> variance?

there is a minor overhead in the return value. Whenever you query the adaptive_storage, two doubles - one for the value and one for the variance -, which is slightly wasteful if you don't care about the variance, then you would need only one double. I don't know how smart compilers are in this case, the compiler may even remove the code that fills the second double when it is not used. In memory, the adaptive_storage uses only a single integer for each counter if you don't use weighted fills.

Returning two doubles even if one is sufficient is a minor overhead, but if this is bothering people I could add a compile-time option for the adaptive_storage class to turn all weight-handling off.

> Furthermore, why does variance() return the sum of squares? Should this
> not be divided by the sample size?

This was already answered by Steven (thanks!).

Kind regards,
Hans