Subject: Re: [boost] [thread 1.48] Multiple interrupt/timed_join leads to deadlock
From: Vicente J. Botet Escriba (vicente.botet_at_[hidden])
Date: 2012-12-12 02:33:26


Le 12/12/12 00:32, Gaetano Mendola a écrit :
> On 11/12/2012 21.23, Vicente J. Botet Escriba wrote:
>> Le 11/12/12 19:12, Gaetano Mendola a écrit :
>>> On 11/12/2012 16.13, Vicente Botet wrote:
>>>> Gaetano Mendola-3 wrote
>>>>>
>>>>> 1) Thread group is now thread safe, it can be used concurrently by
>>>>> multiple threads
>>>>
>>>> Why a thread group should be inherently thread-safe? It seems to me
>>>> that
>>>> having a thread container is already useful.
>>>
>>> It can manage without pestering the developers the fact that one
>>> entity spawns a batch of threads and then wait for the completion
>>> waiting on
>>> the join() while another entity (an user interface as example) can
>>> stop the whole process if it's taking too much time. Otherwise as soon
>>> someone performs a boost::thread_group::join then nothing can be done
>>> from outside to stop the process. It seems a natural use to me.
>> OK I think I understand your use case. Here it is an alternative that
>> don't use any mutex to protect the group of thread.
>> I will choose and owner of all these threads, insert them on a
>> container. Only this thread is able to join/interrupt the threads.
>> I will use some way to transfer the request from the user interface
>> thread to the owner that this is taking too much time (using atomic?).
>> The owner will try to join each thread using try_join_until with the
>> desired expiration time. If the thread is joined the thread is removed
>> from the container. If there is a timeout the owner will check the
>> protected state 'take_too_much_time' and will interrupt all the other
>> threads and then join all of them. As you can see the contention is
>> reduced.
>>
>> Note that this is a specific behavior that can not be added to the
>> thread_group class. I will be for the addition of an algorithm/free
>> function that try to join the threads on a container/range during a
>> given duration or until an expiration time (removing the joined
>> threads).
>
> I don't know who the boost thread maintainer is and how/who decides
> if a design is good to be implemented or it work the other way around?
Here I was talking about my alternative solution.
>
>>>>> 2) thread_group now maintains a list of handlers with the
>>>>> responsibility
>>>>> to:
>>>>> -) Avoid join and interrupts to be called concurrently on a
>>>>> thread
>>>>> -) Avoid to call join on a joined thread
>>>>> -) Avoid to call interrupt on a joined/interrupt thread
>>>>
>>>> IMO, all the threads in a thread_group are owned by the group, and
>>>> use move
>>>> semantics, no need to use pointer to threads. As a consequence there
>>>> is no
>>>> need for the handler/wrapper.
>>>
>>> This is true if the thread_group does not permits to be used by
>>> multiple
>>> threads interrupting/joining.
>> I understand now why you did this way. But I will not do that.
>
> Then the maintainer is you?
Here I was talking as a user, that is, that I will not use the design of
your application.
And yes, I'm the maintainer with Anthony Williams that is the principal
author.
>
>>>>> 4) Due the fact mutex are not fair a thread issuing an interrupt_all
>>>>> most likely will go in starvation if a thread is issuing a
>>>>> join_all
>>>>> (especialy if the group contains a single thread). I can work at
>>>>> it.
>>>>
>>>> Could you clarify your concern?
>>>
>>> Sure, if a thread performs a closed loop:
>>>
>>> <snip>
>>> then it goes in starvationm we have observed this (even in a
>>> deterministic way), then I had to make the two interrupt/timed_join on
>>> the thread handler fair each other.
>>> Our platform is a linux platform with a 3.2.0 kernel.
>>>
>> IMO, only one thread should join/interrupt all the threads. This avoid
>> all these issues.
>
> Avoid the issues at thread group level, but those issues will be present
> into an upper layer.
Maybe. Have you identified these issues on the design I have proposed above?
>
>
>>>> As I said I'm for deprecating thread_group in Boost.Thread. The
>>>> implementation you are proposing has not changed my mind.
>>>
>>> Fine with me, leaving it as it is now is worst than
>> The class thread_group is not a big one. You are free to propose to
>> Boost whatever you consider is good for the Boost community.
>
> I'm proposing to decide what thread_group has to do because as it is
> now it have design flaws.
I agree completely with the design flaw. This thread_group class should
be a specific user class so that it can adapt it to its specific needs.
>
>>>> I would prefer an approach where data structures (containers) and
>>>> algorithms
>>>> (join, interrupt, ...) are separated, thread-safety is not mandatory,
>>>> ...
>>>
>>> I agreed with you about the fact if not explicitly written a library is
>>> not meant to be thread safe, this rule doesn't fit a thread library.
>> This is your opinion and I respect it even if I don't share it.
>>> I suggest to explicitly write, even in bold, the fact that thread_group
>>> and thread classes are not thread safe.
>> I believed that we agreed that the thread group was thread-safe! Of
>> course the thread class is not thread-safe. I could add something like
>> all the functions in this library are not thread-safe until stated
>> explicitly, as I did it for Boost.Chrono since the beginning.
>
> From an academic pure point of view the thread group is thread safe
> because it protects his internal list so it can be used by multiple
> threads, but to the other side if used (as it is now) from multiple
> threads then the best you can get is "everyone guess" joining a batch
> of already joined threads or for the matter to interrupt an already
> joined group.
If the current design doesn't respond to your expectations you should
use an alternative.
> It seems it was coded when it was not a problem for a thread to be
> joined even if joined/interrupted.
You are right, the last change in thread had some some undesirable
impacts on thread_group. IMO, the two fixes I reported in this thread
resolve the issues.
Please let me know if this is not the case.

Best,
Vicente