$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Ray Whitmer (ray_at_[hidden])
Date: 2006-05-05 11:48:30
> Personally, I read this and said, "eeek!" (Well, not literally. My  
> office mates would wonder what was up.) I think any design like  
> this is going to be (at best) very non-portable, and (more likely)  
> subject to breaking at the worst possible moment (i.e. when  
> demonstrating to a customer). Is "relatively" thread-safe really  
> good enough here?
Let me make it clear.  I do not want relatively thread-safe.  I want  
completely thread safe.
It seems to me that boost already relies on an atomic counter for  
shared_ptr.  I do not think the solution would be less portable than  
this.  I certainly want it to run on Windows and Mac, 32 and 64 bit  
as well as major Unix and Linux including certain 32-bit-or-more  
embedded situations.
There may be hopefully-rare C++ threading situations that may be  
difficult to recover from, and I will simply have to gather more  
information on these and see if they can be dealt with.
> As a constructive suggestion, if you want to reduce lock conflicts  
> in hash tables, have you considered some form of finer-grained  
> locking? You could, for example, have one lock per "slot" in your  
> hash table. Computing the hash and finding the slot is thread-safe  
> (since it doesn't depend on the state of the hash table), so you  
> could then lock only the slot itself while performing list  
> maintenance or whatever you need to do to read/write the data.
For the more-common reader of the hash table, it is not my intent to  
reduce lock conflicts, but, rather, to generally eliminate them, and  
replace them with something lighter-weight, such as the atomic  
counter used by shared_ptr, that in best cases may only take a few  
instructions to execute.  I may have thousands of threads reading  
info from the hashtable at once and there may never be a natural time  
when there are no readers so I can update the table.  The purpose of  
the threads is many operations that rely heavily on the hashtables --  
every other instruction they are looking up something.
For what it is worth, finding the slot is not thread-safe, because  
the table size and pointer can change.  List maintenance involves  
multiple slots and I do not think it is worthwhile doing finer- 
grained locks because of the complexity, deadlock potential, and  
likely worse performance it would introduce.
> This would be truly thread-safe but might perform better if you  
> have lots of simultaneous readers/writers. I haven't tested this,  
> but if you do, I'd be interested to hear the results.
I don't think it even begins to be thread-safe, because it doesn't  
begin to deal with reallocation of the array itself, and if it did,  
your fine-grained locks appear to become worthless because you need a  
high-level lock protecting that allocation assuming you are going to  
use locks for readers, which seems like a bad idea from the start for  
me.  There is a reason shared_ptr does not use locks (at least on the  
best-supported platforms), but uses an atomic count.
> PS - I'd be even more worried in Java, btw, since Java opcodes  
> should map to CPU operations less well than a compiled binary. Not  
> to mention that in Java, the portability assumption becomes really  
> critical. Java classes are *supposed* to be portable.
I do not want to get into Java wars here, but Java has to solve the  
deallocation problem in a thread-safe manner.  When the garbage  
collector actually reclaims an allocation, the possibility cannot  
exist that someone else still happens to be using it or the language  
is badly broken.  This is the hard thing to be solved in C++ to make  
a thread-safe hashtable that you do not have to deal with in Java.
The deallocation problem is, from everything I have analyzed, the big  
problem in C++.  When you attempt to deallocate something, for all  
you know there could be another thread that just barely loaded its  
pointer into shared memory and as soon as it unstalls will want to  
increment its reference count, but it has not yet been incremented,  
so even a smart pointer using an atomic reference count fails because  
the object is sharable in a multithreaded environment, but not the  
reference itself.
Here is my first stab at a solution:
Extra data (for each Hashtable):
1.  Array of two atomic counts, one active and one inactive.
2.  Index that flops between 0 and 1, so there is an active and  
inactive count.
3. A mutex that can be used for exclusive writers
Each reader behaves as follows:
1.  Fetch index.
2.  Increment corresponding atomic count.
3.  If fetched index is not current index, decrement atomic count and  
loop to step 1.  This is rare and safely eliminates rare bad cases
4.  Now the reader has free access to the entire hashtable structure,  
and is guaranteed by cooperation of the writer that the pointers it  
sees are not stale.  It must only guarantee that any pointer  
referenced in step 4 is either destroyed or it's ref count is  
incremented before step 5 -- this includes the pointer to the  
allocated resizable array itself.
5.  Decrement the same atomic count (of fetched index) incremented in  
step 3.
Note that in Java, none of this is necessary for readers because the  
language safety assumes any reference is not deallocated.
Each writer behaves as follows:
1. Obtain an exclusive lock on the hashtable.  This lock only  
excludes other writers, not other readers.
2. Perform changes to the hashtable in a manner that, given atomic  
loads or stores of aligned pointers, never leaving the table in a  
state that simultaneous readers cannot use it.  This is very easy  
compared to the allocation issues, and I have been doing this in Java  
for years.  I would be happy to expand on this.  This write operation  
may include releasing objects with ref-count 1, if this is supposed  
to be a weak hashtable that does not keep unused objects in the index.
3. This is really part of 2: any step in (2) which destroys a  
reference a reference keeps a private local copy to make sure it is  
not deallocated before all reading threads are finished with it.  If  
the references are ref-counted, a ref-counted private reference is kept.
4. If there are no private local pointers/references being kept, go  
to step 9
5. Wait for inactive (1-index) atomic count to go to 0.  This is rare  
and safely eliminates rare bad cases.
6. Change index (index = 1 - index).
7. Wait for inactive (1-index) atomic count to go to 0.  This means  
that all readers who might have a copy of a stale pointer are finished.
8. Release all private local references with a reference count of 1.   
Non-ref-counted references always get released, due to single owner.   
If this is a weak hashtable, reinsert any references that have a  
reference count greater than one since these were found and  
incremented while we were waiting for readers to exit.
9. Release lock.  Done.
More notes:
Although we don't want to go out of our way to make the writer  
inefficient, there is a single writer at any one time with an  
exclusive lock.  But we do not have to wait for all readers to exit  
to perform the update, we only need to wait for current readers to  
exit, during which time new readers may enter, which we do not care  
about because we have already removed the candidates from public  
places in the list.  This includes the pointer to the array itself,  
which is generally treated like any other allocated pointer, but does  
not require a reference count.
There are many details I left out that did not directly pertain to  
the allocation problem, but only with atomically updating the rest of  
the table so that the reader view is always consistent.
Ray Whitmer