Subject: Re: [boost] [Review] Boost.Endian mini-review
From: Peter Dimov (lists_at_[hidden])
Date: 2015-01-23 12:47:02


Joel FALCOU wrote:

> Is the library ready to be added to Boost releases?

Let me preface everything else I'll say by that: I think that the answer to
the above question is "yes", as every question raised during the review
seems to have been addressed.

That said, I want to point out that the library in its current state does
not support one approach to dealing with endianness, which I will outline
below.

Let's take the following hypothetical file format as an example, loosely
lifted from the docs:

[00] code (32 bit big endian)
[04] length (32 bit big endian)
[08] version (16 bit little endian)
[10] shape type (32 bit little endian)
[14] ...

All three approaches that the library supports involve declaring a struct
with the above members and expecting that this struct can be read/written
directly to file, which means that its layout and size must correspond to
the above description.

What I tend to do, however, is rather different. I do declare a
corresponding struct:

struct header
{
    int code;
    unsigned length;
    int version;
    int shape_type;
};

but never read or write it directly, which means that I do not need to make
sure that its layout and size are fixed.

Instead, in the function read(h), I do this (pseudocode):

read( header& h )
{
    unsigned char data[ 14 ];
    fread data from file;

    read_32_lsb( h.code, data + 0 );
    read_32_lsb( h.length, data + 4 );
    read_16_msb( h.version, data + 8 );
    read_32_lsb( h.shape_type, data + 10 );
}

Note that this does not require the machine to have a 32 bit int or a 16 bit
int at all. int can be 48 bits wide and even have trap bits. Which is, I
admit, only of academic interest today, but still.

The generic implementation of read_32_lsb is:

void read_32_lsb( int & v, unsigned char data[ 4 ] )
{
    unsigned w = data[ 0 ];

    w += (unsigned)data[ 1 ] << 8;
    w += (unsigned)data[ 2 ] << 16;
    w += (unsigned)data[ 3 ] << 24;

    v = w;
}

which works on any endianness.

This approach - as shown - does have a drawback. If you have an array of
802511 32 bit lsb integers in the file, and the native int is 32 bit lsb,
one read of 802511*4 bytes is vastly superior in performance to a loop that
would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the
above is generally combined with

    void read_32_lsb_n( int * first, int * last, FILE * fp );

but this ties us to using FILE* and, given the rest of the library, is not
strictly needed because it offers us enough options to handle this case
efficiently.