$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [Review] Boost.Endian mini-review
From: Peter Dimov (lists_at_[hidden])
Date: 2015-01-23 12:47:02
Joel FALCOU wrote:
>    Is the library ready to be added to Boost releases?
Let me preface everything else I'll say by that: I think that the answer to 
the above question is "yes", as every question raised during the review 
seems to have been addressed.
That said, I want to point out that the library in its current state does 
not support one approach to dealing with endianness, which I will outline 
below.
Let's take the following hypothetical file format as an example, loosely 
lifted from the docs:
[00]    code (32 bit big endian)
[04]    length (32 bit big endian)
[08]    version (16 bit little endian)
[10]    shape type (32 bit little endian)
[14]    ...
All three approaches that the library supports involve declaring a struct 
with the above members and expecting that this struct can be read/written 
directly to file, which means that its layout and size must correspond to 
the above description.
What I tend to do, however, is rather different. I do declare a 
corresponding struct:
struct header
{
    int code;
    unsigned length;
    int version;
    int shape_type;
};
but never read or write it directly, which means that I do not need to make 
sure that its layout and size are fixed.
Instead, in the function read(h), I do this (pseudocode):
read( header& h )
{
    unsigned char data[ 14 ];
    fread data from file;
    read_32_lsb( h.code, data + 0 );
    read_32_lsb( h.length, data + 4 );
    read_16_msb( h.version, data + 8 );
    read_32_lsb( h.shape_type, data + 10 );
}
Note that this does not require the machine to have a 32 bit int or a 16 bit 
int at all. int can be 48 bits wide and even have trap bits. Which is, I 
admit, only of academic interest today, but still.
The generic implementation of read_32_lsb is:
void read_32_lsb( int & v, unsigned char data[ 4 ] )
{
    unsigned w = data[ 0 ];
    w += (unsigned)data[ 1 ] << 8;
    w += (unsigned)data[ 2 ] << 16;
    w += (unsigned)data[ 3 ] << 24;
    v = w;
}
which works on any endianness.
This approach - as shown - does have a drawback. If you have an array of 
802511 32 bit lsb integers in the file, and the native int is 32 bit lsb, 
one read of 802511*4 bytes is vastly superior in performance to a loop that 
would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the 
above is generally combined with
    void read_32_lsb_n( int * first, int * last, FILE * fp );
but this ties us to using FILE* and, given the rest of the library, is not 
strictly needed because it offers us enough options to handle this case 
efficiently.