Subject: Re: [boost] [BGL] Trying to get a correclty working Parallel BFS code
From: Sandeep Gupta (gupta.sandeep_at_[hidden])
Date: 2009-10-27 23:33:21


On Tue, Oct 27, 2009 at 12:33 PM, Sandeep Gupta <gupta.sandeep_at_[hidden]>wrote:

>
>
> On Tue, Oct 27, 2009 at 10:27 AM, Jeremiah Willcock <jewillco_at_[hidden]>wrote:
>
>> On Tue, 27 Oct 2009, Sandeep Gupta wrote:
>>
>> On Mon, Oct 26, 2009 at 11:29 PM, Jeremiah Willcock <jewillco_at_[hidden]
>>> >wrote:
>>>
>>> On Mon, 26 Oct 2009, Sandeep Gupta wrote:
>>>>
>>>> Hi All,
>>>>
>>>>> I have spent significant effort on parallel bfs. Currently its almost
>>>>> working. I would really appreciate some help to get it running
>>>>> correctly.
>>>>> I
>>>>> have attached the code (self contained and requires no input file).
>>>>> I think I have followed all the guidelines regarding distributed
>>>>> property
>>>>> maps mentioned in the docs (
>>>>>
>>>>>
>>>>> http://www.osl.iu.edu/research/pbgl/documentation/graph/breadth_first_search.html
>>>>> ).
>>>>> I am out of ideas of where things can could have gone wrong.
>>>>>
>>>>>
>>>> The graph is distributed by source vertex across all processors, and so
>>>> you
>>>> need to add them on every processor (or at least on the processor that
>>>> owns
>>>> them, which is probably not processor 0). Thus, your graph is missing
>>>> whichever edges would be on processor 1, leading to incorrect results.
>>>> Try
>>>> adding all of the edges on all processors and see if this fixes your
>>>> problem.
>>>>
>>>> -- Jeremiah Willcock
>>>>
>>>>
>>>>
>>> Jeremiah, thanks so much looking into this promptly. I took cues from
>>> building distributed graph section of distributed adjacency list page
>>> which
>>> has an example:
>>>
>>> ///---------------Begin example
>>>
>>> Graph g(n); // initialize with the total number of vertices, n
>>> if (process_id(g.process_group()) == 0) {
>>> // Only process 0 loads the graph, which is distributed automatically
>>> int source, target;
>>> for (std::cin >> source >> target)
>>> add_edge(vertex(source, g), vertex(target, g), g);
>>> }
>>>
>>> // All processes synchronize at this point, then the graph is complete
>>> synchronize(g.process_group());
>>> //---------End
>>>
>>> Looking at this, it seems that distribution happens automatically which
>>> is
>>> what the manual says.
>>> This is further confirmed when I output the subgraphs at each process.
>>>
>>
>> OK. I was mistaken on this point then. Could you please try the
>> following graph with a distribution that puts vertices 0-3 on rank 0 and 4
>> on rank 1 (just add three dummy vertices on the end and use a block
>> distribution)?
>>
>> 0 -> 1
>> 1 -> 2
>> 2 -> 3
>> 0 -> 4
>> 4 -> 3
>>
>> See if the depths are reasonable for that graph. Also, the contents of
>> the predecessor maps output by your search (both the original one and the
>> graph I just gave) would be nice too; that is just an extra map you can give
>> to BFS and you can just dump it the same way you dump depth values.
>>
>>
>
>> I tried adding predecessor map but I don't think its currently. No
>> reduction function is defined for predecessor map. Please see below for the
>> attached error.
>>
> I tried the bfs with your input (and distribution) and the depth number
> are coming out to be incorrect. This is a surely a better input to debug
> this problem:-).
> The depth value of node n=3 should be d=2 but its coming out to be d=0.
> I believe the update message from node n=4 on process 1 is not received or
> updated by process 0.
> Attached below is the output of levels:
> Finished Parallel BFS
> Node ID : Level
> 0 : 0
>
> 1 : 1
> 2 : 2
> 3 : 0
> 4 : 1
> 5 : 4294967295
> 6 : 4294967295
> 7 : 4294967295
>
> The two subgraphs on each process
> Subgraph Begin
> n0 -> n1;
> n0 -> n4;
> n1 -> n2;
> n2 -> n3;
> Subgraph End
>
> Subgraph Begin
> n4 -> n3;
> Subgraph End
>
> Thanks,
> Sandeep
>
> //-------------------------
> The error snippet regarding predecessor map:
>
> error: no match for call to
> '(boost::property_reduce<boost::vertex_predecessor_t>::apply<boost::detail::error_property_not_found>)
> (const boost::detail::parallel::global_descriptor<unsigned int>&)'
>
>
> ../../../boost/property_map/parallel/impl/distributed_property_map.ipp:141:
> error: no match for call to
> '(boost::property_reduce<boost::vertex_predecessor_t>::apply<boost::detail::error_property_not_found>)
> (const boost::detail::parallel::global_descriptor<unsigned int>&,
> boost::detail::error_property_not_found&, const
> boost::detail::error_property_not_found&)'
> ../../../boost/graph/parallel/properties.hpp:95: note: candidates are: T
> boost::property_reduce<boost::vertex_predecessor_t>::apply<T>::operator()(T)
> const [with T = boost::detail::error_property_not_found]
> ../../../boost/graph/parallel/properties.hpp:96: note: T
> boost::property_reduce<boost::vertex_predecessor_t>::apply<T>::operator()(T,
> T, T) const [with T = boost::detail::error_property_not_found]
>
>
> Jeremiah, Do you think that I should file a bug report for this.
Although I was hoping (and actually needed quite urgently) that it would be
minor issue and get fixed quickly.

Thanks,
Sandeep