From: Aaron W. LaFramboise (aaronrabiddog51_at_[hidden])
Date: 2004-07-05 20:46:16


flodin_at_[hidden] wrote:
> It sounds reasonable that the kind of lock described will perform badly in
> high-contention situations. But what evidence do you have that there is any
> probability of high contention, especially when, as you say, the locks are
> short-lived? These locks aren't taken every time the smart pointer is accessed;
> they're only taken when a copy is made. So to get the high contention you speak
> of, you would have to have two threads repeatedly making copies of the "same"
> smart pointer. From where I stand, this seems sufficiently rare that the extra
> performance gain in the common cases is worth it.

[these comments are specific to the win32 spinlock discussed]

I have two specific concerns.

1) Personally, I value predictable performance. I would prefer slight
performance degradation (Note, however, that I am not examining the
actual numbers.) to the case where I may have an occasional very long
wait. (Specific example: If a graphically intensive program is aiming
for 30 frames per second, that is only about 30ms per frame. It would
be very unfortunate if, even occasionally, a single smart pointer copy
accounted for 1/3 of the time it took to render a frame.)

2) In "thundering herd" situations, contention is unavoidable, and
performance may be paramount. The possibility of a single thread
halting many other threads for 10ms on a machine with many processors is
disturbing. (Specific example: In a case where thread is producing work
for many consumer threads on a multiprocessor machine, and obtaining the
work involves copying a smart pointer, it is possible that some amount
of threads will forced to sleep every single time, which might be a
substantial percentage of total job wall clock time.)

In any case, I feel fairly strongly that sleeping is not the right thing
to do. Despite the weakness in appeal to authority, I will note that I
have seen Butenhof make the same assertion on Usenet, surely in a manner
more eloquent than mine.

I previously neglected to suggest alternatives.

1) A generally sound 8-byte mutex could be used. I am not sure how bad
this would be in terms of storage efficiency. It may be possible to
implement it using less storage, but I have no specific information on this.

2) I've seen a shared_ptr count implementation that used
InterlockedIncrement instead of the lwm code. I have not examined this
in detail; however it seems that an approach like this is better in all
respects than trying to manage locks if the only need is to maintain a
count (which is what many mutex primatives are doing anyway). Is there
some reason this cannot be used?

What do you think?

Aaron W. LaFramboise