Subject: [boost] [Endian] Performance
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2011-09-05 18:38:15


Dear All,

I've just done some quick benchmarks of Beman's proposed byte-swapping
code using the following test program:

#include <cstdint>
#include <iostream>

#include <arpa/inet.h> // for htonl()

using std::uint32_t;

static inline uint32_t byteswap(uint32_t src)
{

#if defined(USE_GCC_BUILTIN)

   return __builtin_bswap32(src);

#elif defined(USE_HTONL)

   return htonl(src);

#elif defined(USE_REV_INSTR)

   uint32_t target;
   __asm__ ( "rev %0, %1\n"
             : "=r" (target)
             : "r" (src)
           );
   return target;

#elif defined(USE_BEMAN)

   const char* s (reinterpret_cast<const char*>(&src));
   uint32_t target;
   char * t (reinterpret_cast<char*>(&target) + sizeof(target) - 1);
   *t = *s;
   *--t = *++s;
   *--t = *++s;
   *--t = *++s;
   return target;

#else
#error "Define USE_*"
#endif

}

int main()
{
   uint32_t s = 0;
   for (uint32_t i = 0; i < 1000000000; ++i) {
     s += byteswap(i);
   }

   std::cout << s << "\n"; // Expect 506498560
}

I've tested this on two systems:
(A): Marvell ARMv5TE ("Feroceon") @ 1.2 GHz, g++ 4.4
(B): Freescale ARMv7 (i.MX53, Coretex A8) @ 1.0 GHz, g++ 4.6.1

Compiled with: --std=c++0x -O4

Execution times in seconds are:

                    A B
USE_GCC_BUILTIN 29.4 3.0
USE_HTONL 12.6 3.0
USE_REV_INSTR n/a 3.0
USE_BEMAN 17.6 8.0

ARM architecture version 6 (IIRC) introduced a "rev" byte-swapping
instruction. On system B, which supports this, it is used by the gcc
builtin and glibc's htonl() giving better performance than Beman's
code. On system A which does not support this instruction, the gcc
builtin makes a library call; I've not looked at what htonl() does.

What do people see on other platforms?

How important do we consider performance to be for this library?

Regards, Phil.