$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Chris Thomasson (cristom_at_[hidden])
Date: 2006-10-21 23:30:55
"Peter Dimov" <pdimov_at_[hidden]> wrote in message 
news:000601c6f582$10c92890$6507a8c0_at_pdimov2...
> Chris Thomasson wrote:
>> The basic technique for 'copying' is as follows:
[...]
> I thought that that might be the case. Won't you need to undo the XADD
> before unlocking and retrying, though? That, or use CAS.
I could use CAS, however I wanted to try an come up with an algorithm that 
can avoid CAS...
> The decrement thread may not care, but if another increment thread gets in 
> first, bad
> things happen; or am I missing something else?
This is where the reload-and-compare logic comes into play. It basically 
boils down to a simple coherent lock-based snapshot algorithm. Load before 
lock, reload after lock, compare, if equal you know that the previous load 
is coherent with the loaded value that you grabbed under the protection of 
the lock, if not unlock and retry... This will detect ALL swaps from shared 
locations that jump in just before we lock; I guess its kind of similar 
logic to the double check in SMR... If SMR did not perform that 
double-check, that it would be broken beyond repair... Just like my 
algorithm would. The snapshot logic does not touch any part of a refcount 
object. It operates on shared locations that contain pointers to refcount 
object.
It is not possible for a increment thread to load a pointer to refcount 
object, after it drops to zero, except in the condition I described earlier 
wrt ABA occurs; the algorithm is compatible with ABA.
A refcount object drop to zero condition means that there are no shared 
locations that contain any pointers to it. Any increment threads that have 
loaded pointers, will fail when they reload-and-compare. If one gets 
through, and their compare succeeds, it doesn't matter because it has 
already locked the associated spinlock, so it has exclusive access to the 
refcount. Since its reload and compare succeeded, that means it happened 
before any decrement thread got to execute; swaps happen before decrements 
(e.g., swap shared loc with 0, dec the old ptr). A decrement thread will 
always lock the spinlock before it calls the destructor, so it will wait for 
the one that got through the compare logic...
One thing I forgot to mention in my last post, nit-picking, your algorithm 
sketch invoked the user-provided destructor in the context of the 
critical-section provided by the lock that is associated with the count ptr. 
Can't call function pointers while you hold a lock?   ;)
Humm... I have to admit that if I used CAS, the algorithm logic would be 
easier to understand...
;)
>> IMHO, anytime you can reduce the number of atomic operations and/or 
>> memory
>> barriers is a plus...
>
> Maybe... Given a choice between the two, I would prefer to somehow 
> eliminate
> the spinlock in 'copy', though. :-)
You can get rid of it by using PDR... I have a hybrid version of my 
algorithm that uses it. Things get tricky here because of all of the RCU, 
SMR patents, not to mention my vZOOM patent... I think Boost could make use 
of one of the following collectors:
http://groups.google.com/group/comp.programming.threads/msg/f443b38cf7bbca8a
http://groups.google.com/group/comp.programming.threads/msg/a4a796e25a157ca1
One of them uses DWCAS... I have workaround, you can make use of offset 
trick explained here:
http://groups.google.com/group/comp.arch/msg/a3ebfe80363a4399
The other one uses CAS, however it operated on fixed number of threads. I 
also have a workaround using an offset trick, or another technique that 
allows for X number of threads to acquire proxy reference without 
blocking...
This is good because reads and writes can be lock-free, and reads and writes 
are no longer mutually excludable... Humm... I don't have docs on the 
algorithm... Wonder if I should post it; its experimental...