$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2007-09-06 13:44:40
Hi Peter,
Peter Dimov wrote:
> Phil Endecott:
>> I note that shared_ptr uses architecture-specific assembler for the
>> atomic operations needed for thread safe operations, on x86, ia64 and
>> ppc; it falls back to pthreads for other architectures. Has anyone
>> quantified the performance benefit of the assembler?
>>
>> Assuming that the benefit is significant, I'd like to implement it for
>> ARM. Has anyone else looked at this?
>>
>> ARM has a swap instruction. I have a (very vague) recollection that
>> perhaps some of the newer chips have some other locked instructions
>> e.g. test-and-set, but I would want to code to the lowest common
>> denominator i.e. swap only. Is this sufficient for what shared_ptr wants?
>>
>> I note that since 4.1, gcc has provided built-in functions for atomic
>> operations. But it says that "Not all operations are supported by all
>> target processors", and the list doesn't include swap; so maybe this
>> isn't so useful after all.
>
> Can you try the SVN trunk version of shared_ptr and look at the assembly?
> detail/sp_counted_base.hpp should choose sp_counted_base_sync.hpp for g++
> 4.1 and higher and take advantage of the built-ins.
Well it's quicker for me to try this:
int x;
int main(int argc, char* argv[])
{
__sync_fetch_and_add(&x,1);
}
$ arm-linux-gnu-g++ --version
arm-linux-gnu-g++ (GCC) 4.1.2 20061028 (prerelease) (Debian 4.1.1-19)
$ arm-linux-gnu-g++ -W -Wall check_sync_builtin.cc
check_sync_builtin.cc:3: warning: unused parameter âargcâ
check_sync_builtin.cc:3: warning: unused parameter âargvâ
/tmp/ccwWxfsT.o: In function `main':
check_sync_builtin.cc:(.text+0x20): undefined reference to `__sync_fetch_and_add_4'
collect2: ld returned 1 exit status
(It does compile on x86, and the disassembly includes a "lock addl" instruction.)
As I mentioned before, gcc doesn't implement these atomic builtins on
all platforms, i.e. it doesn't implement them on platforms where the
hardware doesn't provide them. I don't fully understand how this all
works in libstdc++ (there are too many levels of #include and #if for
me to follow) but there seems to be a __gnu_cxx::__mutex that they can
use in those cases.
> To answer your question: no, a mere swap instruction is not enough for
> shared_ptr, it needs atomic increment, decrement and compare and swap.
Well, I think you can implement a spin-lock mutex with swap:
int mutex=0; // 0 = unlocked, 1 = locked
void lock() {
do {
int n=1;
swap(mutex,n); // atomic swap instruction
} while (n==1); // if n is 1 after the swap, the mutex was already locked
}
void unlock() {
mutex=0;
}
So you could using something like that to protect the reference counts,
rather than falling back to the pthread method. Or alternatively,
could you use a sentinel value (say -1) in the reference to indicate
that it's locked:
int refcount;
int read_refcount() {
do {
int r = refcount;
} while (r==-1);
return r;
}
int adj_refcount(int adj) {
int r=-1;
do {
swap(refcount,r);
} while (r==-1);
refcount = r+adj;
}
(BTW, for gcc>=4.1 on x86 would you plan to use the gcc builtins or the
existing Boost asm?)
Regards,
Phil.