From: Peter Dimov (pdimov_at_[hidden])
Date: 2024-12-10 14:18:25


Andrey Semashev wrote:
> Above, I have intentionally omitted the constructor from uint64_t as I find it
> confusing. In particular, it isn't clear what endianness it uses to seed the
> algorithm. It looks like it duplicates hash_append.

It doesn't use any endianness. It's just used as a seed from which to derive the
initial state.

Most, if not all, existing non-cryptographic algorithms take integral seeds.

> The documentation should provide a more detailed description of the
> HashAlgorithm::result_type type. Specifically, what interface and operations it
> must support. Ideally, the library should provide a concept, similar to
> HashAlgorithm. Currently, the documentation only says it can be an unsigned
> integer type or a "std::array-like" type, where the latter is rather nebulous.

Hash2 started out as a C++03 library, and mandated use of boost::array. Then
it became a C++11 library, and mandated use of std::array. But then it became
a C++14 constexpr library, and std::array didn't work, so now it mandates the
nebulous "std::array-like" type (which is not yet completely nailed down, but
will crystallize with practice.)

> Also on the topic of get_integral_result for "std::array-like" hash values, a few
> comments:
>
> - It requires the static size of 8 regardless of the size of the requested integer,
> so even if uint32_t is requested it will still require at least std::array<unsigned
> char, 8>. This should not be necessary.

I'm not yet clear on the practical utility of using a result_type that is an array
of fewer than 8 bytes. We don't have any such examples yet, so it's hard to
tell whether this makes any sense and whether it should be explicitly blessed,
and to what extent.

> - It ignores the data beyond the initial 8 bytes. It might be ok for the purpose
> of slicing the hash value, but it is inconsistent with get_integral_result for
> integers, which attempts to mix bits through multiplications with a constant.

This is explicitly stated in the documentation of get_integral_result, in the
Remarks.

In practice it's a useful heuristic to assume that an array-like result implies
a hash function of high quality (an avalanching one) so that picking any M
bits would suffice.

Not making this assumption would make things a bit hairy; it's not quite
clear how to turn an arbitrary number of not-high-quality bits into 32. The
library tries its best for the integral types, but there the possibilities are at
least a finite amount and multiplications are built-in.

> It might be possible to reuse one of the hash algorithms implemented in
> the library for this bit mixing.

Interesting "meta" approach, but not necessarily better in practice than
the current one, under any practical criteria.

> So I think, get_integral_result and HashAlgorithm::result_type should be
> better defined by the library.

It most likely will be, eventually, once we have near-100% clarity on how
precisely it needs to be defined.

> The hmac_* type aliases are not provided for siphash and xxhash. But I would
> actually prefer that those type aliases be removed for the rest of the hash
> algorithms, so that the algorithms don't bring the dependency on hmac.hpp,
> which is not always needed by the user. Typing hmac<sha2_256> instead of
> hmac_sha2_256 is hardly any more tedious.

hmac.hpp is a very light header, so...

> Thank you Peter for submitting the library and Matt for managing the review.

Thanks, Andrey, for your review.