$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [Math/Statistical Distributions] Rethinking of distributiontemplate parameters.
From: John Maddock (john_at_[hidden])
Date: 2009-05-21 05:47:07
> This is a feature request for the next version of Math/Statisical
> Distributions lib.
>
> Currently, due to lack of input type information, discrete
> distributions can only be "emulated" by using the discrete_quantile
> policy.
> However, doing so the effective quantile type is still a real type.
>
> In my opinion, this have at least two disadvantages:
I believe your disadvantages are more imagined than real.
> 1. Operations are slow since the underlying quantile type is still
> real. Instead, operations on really integral types are generally
> faster.
Unfortunately there is no way the quantile of discrete distributions can be 
calculated internally using all integer arithmetic (at least I can't think 
of a case other than maybe the trivial bernoulli distribution).  Normally 
the result of the quantile is calculated as a real-number and then 
appropriately rounded acording to the policy in effect, in a few cases the 
result is calculated directly as an integer by summing CDF values 
(hypergeomentric for example), but the internal calculations still have to 
done using reals.
There's also no overhead from returning a real type (since it's usually 
returned in a register just like an integer type would be), there might be a 
tiny overhead if the user then casts to an integer, but if we internalised 
that cast by returning an integer type then everyone would pay that cost no 
matter what the use case :-(
BTW there are a few genuine use cases for returning a real-valued result 
from the quantile of a descrete distribution.
> 2. Quantile comparison might be inaccurate since we are comparing real 
> types
Nope, not if you've requested an integer result (which is the default 
policy), as integers are represented exactly in floating point types: unless 
the integer is so large as exceed the number of mantissa bits - but then the 
result would likely overflow an integer type anyway.  In fact this is an 
important use case - the ability to return values larger than INT_MAX etc as 
a real valued type.
There is one genuine concern here, but it can't be solved by your interface: 
that is if the result of the quantile function is calculated to be very very 
close to an integer value, but due to the usual rounding errors in 
calculation we can't be sure which side of the integer the true value lies. 
Unfortunately there is simply no way around this - we have to use 
real-valued types in the internal calculation, and all the stats packages 
I'm aware of have the same potential issue.
Cheers, John.