Subject: Re: [boost] Notice: Boost.Atomic (atomic operations library)
From: Helge Bahmann (hcb_at_[hidden])
Date: 2009-11-30 12:08:50


Hi Phil!

Thanks for your interest, and I appreciate any help for Arm, as I don't have
this architecture available.

Am Monday 30 November 2009 17:02:14 schrieb Phil Endecott:
[snip]
> Architecture v6 introduced 32-bit load-locked/store-conditional
> instructions. Architecture v7 introduced 16- and 8-bit versions.

The library already has infrastructure in place to emulate 8- and 16-bit
atomics by "embedding" them into a properly aligned 32-bit atomic
(created "on the fly" through appropriate pointer casts). FWIW ppc and Alpha
require this already, as they do not have 8/16-bit ll/sc. This is of course
slower than native 8-/16-bit versions, but is workable.

I will shortly be adding a small howto on adding platform support to the
library.

> ARM Linux has kernel support that provides compare-and-swap even on
> processors that don't support it by guaranteeing to not interrupt code
> in certain address ranges. This has the cost of a function call, i.e.
> it's slower than inline assembler but a lot faster than a system call.
> Kernels that don't support this are now sufficiently old that I think
> they can be ignored. Newer versions of gcc may use this mechanism when
> the atomic builtins are used, but versions of gcc that don't do this
> are sufficiently widespread that they should still be supported
> efficiently.

these functions are part of libc, glibc or the vdso?

> I believe that OS X on ARM (i.e. the iPhone) always runs on
> architecture v6 or newer. However Apple supply a version of gcc that
> is too old to support ARM atomics via the builtins. The "recommended"
> way to do atomics is via a set of function calls described here:
> http://developer.apple.com/mac/library/documentation/Darwin/Reference/ManPa
>ges/man3/atomic.3.html I have not looked at what these functions do or tried
> to benchmark them. They are also available on other OS X platforms.

these should easily be usable, but
- the *Barrier versions are still stronger than what is required (see below)
- there are no "Load with Barrier" and "Store with Barrier" operations, these
would have to be emulated with compare_exchange

> I note that you don't seem to use the gcc atomic builtins even on
> platforms where they have worked for a while e.g. x86. Any reason for
> that?

on x86 it would not matter; on all other platforms, the intrinsics have the
unfortunate side-effect of always acting as (usually bi-directional) memory
barriers. There are however legitimate use cases, for example the following
operation (equivalent to __sync_fetch_and_add):

        atomic<int>::fetch_add(1, memory_order_acq_rel)

is 2 to 3 times slower on ppc than the version not enforcing memory ordering:

        atomic<int>::fetch_add(1, memory_order_relaxed)

If you always use fully-fenced versions, then any lock-free algorithm will
usually be noticeably *slower* than the platform's native mutex lock/unlock
operation (which use only the weakest barriers necessary), making the whole
exercise rather pointless.

Cheers Helge