$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [lockfree::fifo] Review
From: Chris M. Thomasson (cristom_at_[hidden])
Date: 2010-01-20 18:51:31
"Helge Bahmann" <hcb_at_[hidden]> wrote in message 
news:alpine.DEB.1.10.1001191553030.25332_at_m65s28.vlinux.de...
> On Mon, 18 Jan 2010, Chris M. Thomasson wrote:
>
>> "Helge Bahmann" <hcb_at_[hidden]> wrote in message 
>> news:201001181953.33044.hcb_at_chaoticmind.net...
>>> Am Saturday 16 January 2010 03:15:44 schrieb Chris M. Thomasson:
>>>> FWIW Helge, here is a new proxy garbage collector algorithm of mine 
>>>> that I
>>>> am currently experimenting with:
>>>>
>>>> http://groups.google.com/group/comp.programming.threads/browse_frm/thread/a
>>>> 53f24de178b419f
>>>
>>> interesting concept, but AFAICT it shares two undesirable properties 
>>> with my
>>> implementation: Reclamation can be delayed indefinitely if
>>>
>>> - one thread is "stuck" within the critical section (e.g. scheduled 
>>> away)
>>
>> Well, if a thread actually deadlocks or "permanently livelocks" within 
>> the critical section, then no reclamation can occur.
>
> this means that reclamation (and by extension writing with bounded amount 
> of memory) is not wait-free (this is not a problem I think, just pointing 
> out)
Yes; very good point Helge. I simply _failed_ to clarify that the wait-free 
properties come from the fact that I am not using loops (e.g., CAS loop) to 
acquire/release a reference to a collector object. This has been a point of 
interest in the past over on `comp.programming.threads'. Previous collectors 
used CAS-loop to either acquire and/or release a reference. This does not 
work out to well for a reader-writer pattern because only one reader thread 
can gain access while the others fail their CAS and have to try again. This 
lock-free property can result in fairly poor performance for readers. 
Therefore, we have been coming up with wait-free techniques to allow readers 
to gain access without looping around in software.
>>> - threads interlock such that at least one thread is always within the
>>> critical section
>>
>>> Right?
>>
>> No. My proxy algorithm is N = 2, while yours is N = 1 where N is the 
>> number of collector objects. This means that you only have a _single_ 
>> reference count. During periods of load, this single reference count 
>> might never reach zero and the garbage will keep piling up.
>
> [snip explanation]
>
> Okay thanks for your explanation, I did not understand your code correctly 
> the first time, but now things are much more clear.
That's great!  :^)
> The concept is very nice indeed, and I agree that N=1 and N=2 makes quite 
> a difference. I *think* that your approach is much more valuable as a 
> building block for "object-local RCU" for data structures with long-lived 
> read access than ABA prevention for producer/consumer data structures 
> (where the read-side critical section is very short).
I think I agree with that statement.
> Thanks for sharing the code,
You are welcome! If you can think of any optimizations I would LOVE to hear 
all about them. One thing I can think of would be to remove the need for 
threads to actually destroy objects. This can be moved to a separate thread 
or the objects can simply be stuck in a cache to be reused. For instance, if 
the `proxy::release(...)' procedure was turned into a function which 
returned a pointer to a stack of nodes that are in a quiescent state, then 
the thread could decide what to do with them. I think that this would make 
the algorithm more flexible.
Please note that this is only an initial implementation and it's more of a 
proof-of-concept than anything. There are probably many improvements that 
can be made...
;^)
> I can already think of an application for this in my own code :)
Cool. IMVHO, the proxy collector is fairly generic in that you can use it to 
solve memory reclamation in many different types of non-blocking algorithms 
and *usage patterns. I think that if Boost does include a non-blocking 
library, then something along the lines of the collector algorithm should 
probably be included.
[*] - I am experimenting with the collector using the following usage 
pattern. Imagine a database server that get very many sustained query on a 
regular basis. Now, reader threads receive search requests and then execute 
the query on a dynamic shared data-structure. Here is high-level pseudo-code 
of how I am constructing the logic of the reader threads:
__________________________________________________________________
#define SYNC_INTERVAL 256
#define DEFER_THRESHOLD 128
typedef proxy<DEFER_THRESHOLD> proxy_type;
static proxy_type g_proxy;
void reader_thread()
{
    proxy_type::collector* c = &g_proxy.acquire();
    for (unsigned i = 1 ;; ++i)
    {
        query_type* q = try_to_get_a_query_request();
        if (! q)
        {
            g_proxy.release(*c);
            q = wait_for_a_query_request();
            c = &g_proxy.acquire();
        }
        execute_query(q);
        if (! (i % SYNC_INTERVAL))
        {
            g_proxy.release(*c);
            c = &g_proxy.acquire();
        }
    }
    g_proxy.release(*c);
}
__________________________________________________________________
Instead of acquiring/releasing a collector object for every single execution 
of a query this technique instead attempts to amortize the number of times a 
reader thread needs to acquire/release a collector.
Now, I can think of a possible optimization/enhancement to the proxy 
algorithm itself. Take note that an odd reference count denotes that a 
collection cycle is in progress. Therefore, if a reader thread can simply 
test if the reference count of it's acquired collector object is even, then 
it simple does not actually "need" to release a reference to it. If it's 
odd, then a release is necessary in order to move along the current 
collection process. This would allow the reader threads acquire/release 
pattern to be amortized across the actual garbage deferment threshold. I 
think I am going to alter the algorithm in order to support the following 
usage pattern:
__________________________________________________________________
#define DEFER_THRESHOLD 256
typedef proxy<DEFER_THRESHOLD> proxy_type;
static proxy_type g_proxy;
void reader_thread()
{
    proxy_type::collector* c = &g_proxy.acquire();
    for (;;)
    {
        query_type* q = try_to_get_a_query_request();
        if (! q)
        {
            g_proxy.release(*c);
            q = wait_for_a_query_request();
            c = &g_proxy.acquire();
        }
        execute_query(q);
        if (c->is_collecting())
        {
            g_proxy.release(*c);
            c = &g_proxy.acquire();
        }
    }
    g_proxy.release(*c);
}
__________________________________________________________________
This would allow reader threads to curn along during periods of load with 
fairly low overhead. I say this because the only time the readers would have 
to release their collector object and re-acquire it is when they must 
explicitly wait for query requests, or when the garbage threshold was 
reached and a collection cycle was initiated by a writer thread. There are 
some fairly promising enhancements I think I can make to the collector, and 
that is one of them.
Humm...