$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [Review] Boost.Endian mini-review
From: Peter Dimov (lists_at_[hidden])
Date: 2015-01-23 12:47:02
Joel FALCOU wrote:
> Is the library ready to be added to Boost releases?
Let me preface everything else I'll say by that: I think that the answer to
the above question is "yes", as every question raised during the review
seems to have been addressed.
That said, I want to point out that the library in its current state does
not support one approach to dealing with endianness, which I will outline
below.
Let's take the following hypothetical file format as an example, loosely
lifted from the docs:
[00] code (32 bit big endian)
[04] length (32 bit big endian)
[08] version (16 bit little endian)
[10] shape type (32 bit little endian)
[14] ...
All three approaches that the library supports involve declaring a struct
with the above members and expecting that this struct can be read/written
directly to file, which means that its layout and size must correspond to
the above description.
What I tend to do, however, is rather different. I do declare a
corresponding struct:
struct header
{
int code;
unsigned length;
int version;
int shape_type;
};
but never read or write it directly, which means that I do not need to make
sure that its layout and size are fixed.
Instead, in the function read(h), I do this (pseudocode):
read( header& h )
{
unsigned char data[ 14 ];
fread data from file;
read_32_lsb( h.code, data + 0 );
read_32_lsb( h.length, data + 4 );
read_16_msb( h.version, data + 8 );
read_32_lsb( h.shape_type, data + 10 );
}
Note that this does not require the machine to have a 32 bit int or a 16 bit
int at all. int can be 48 bits wide and even have trap bits. Which is, I
admit, only of academic interest today, but still.
The generic implementation of read_32_lsb is:
void read_32_lsb( int & v, unsigned char data[ 4 ] )
{
unsigned w = data[ 0 ];
w += (unsigned)data[ 1 ] << 8;
w += (unsigned)data[ 2 ] << 16;
w += (unsigned)data[ 3 ] << 24;
v = w;
}
which works on any endianness.
This approach - as shown - does have a drawback. If you have an array of
802511 32 bit lsb integers in the file, and the native int is 32 bit lsb,
one read of 802511*4 bytes is vastly superior in performance to a loop that
would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the
above is generally combined with
void read_32_lsb_n( int * first, int * last, FILE * fp );
but this ties us to using FILE* and, given the rest of the library, is not
strictly needed because it offers us enough options to handle this case
efficiently.