$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [BGL] Trying to get a correclty working Parallel BFS code
From: Sandeep Gupta (gupta.sandeep_at_[hidden])
Date: 2009-11-09 20:22:03
On Wed, Nov 4, 2009 at 3:39 PM, Nick Edmonds <ngedmond_at_[hidden]>wrote:
>
> On Nov 4, 2009, at 6:22 PM, Sandeep Gupta wrote:
>
> On Wed, Nov 4, 2009 at 3:11 PM, Sandeep Gupta <gupta.sandeep_at_[hidden]
>> >wrote:
>>
>>
>>>
>>> On Wed, Nov 4, 2009 at 12:18 PM, Nick Edmonds <ngedmond_at_[hidden]
>>> >wrote:
>>>
>>>
>>>> On Nov 3, 2009, at 10:44 PM, Sandeep Gupta wrote:
>>>>
>>>> On Mon, Nov 2, 2009 at 12:43 PM, Nick Edmonds <ngedmond_at_[hidden]
>>>>
>>>>> wrote:
>>>>>>
>>>>>
>>>>>
>>>>> On Oct 31, 2009, at 1:00 AM, Sandeep Gupta wrote:
>>>>>>
>>>>>> On Fri, Oct 30, 2009 at 6:41 AM, Jeremiah Willcock <
>>>>>> jewillco_at_[hidden]
>>>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> On Thu, 29 Oct 2009, Sandeep Gupta wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On Thu, Oct 29, 2009 at 1:12 PM, Jeremiah Willcock <
>>>>>>>> jewillco_at_[hidden]
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 29 Oct 2009, Sandeep Gupta wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, Oct 28, 2009 at 5:49 PM, Sandeep Gupta <
>>>>>>>>>> gupta.sandeep_at_[hidden]
>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 27, 2009 at 9:31 PM, Jeremiah Willcock <
>>>>>>>>>>>
>>>>>>>>>>> jewillco_at_[hidden]
>>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 27 Oct 2009, Sandeep Gupta wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> The error snippet regarding predecessor map:
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> error: no match for call to
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> '(boost::property_reduce<boost::vertex_predecessor_t>::apply<boost::detail::error_property_not_found>)
>>>>>>>>>>>>>>> (const boost::detail::parallel::global_descriptor<unsigned
>>>>>>>>>>>>>>> int>&)'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ../../../boost/property_map/parallel/impl/distributed_property_map.ipp:141:
>>>>>>>>>>>>>>> error: no match for call to
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> '(boost::property_reduce<boost::vertex_predecessor_t>::apply<boost::detail::error_property_not_found>)
>>>>>>>>>>>>>>> (const boost::detail::parallel::global_descriptor<unsigned
>>>>>>>>>>>>>>> int>&,
>>>>>>>>>>>>>>> boost::detail::error_property_not_found&, const
>>>>>>>>>>>>>>> boost::detail::error_property_not_found&)'
>>>>>>>>>>>>>>> ../../../boost/graph/parallel/properties.hpp:95: note:
>>>>>>>>>>>>>>> candidates
>>>>>>>>>>>>>>> are:
>>>>>>>>>>>>>>> T
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> boost::property_reduce<boost::vertex_predecessor_t>::apply<T>::operator()(T)
>>>>>>>>>>>>>>> const [with T = boost::detail::error_property_not_found]
>>>>>>>>>>>>>>> ../../../boost/graph/parallel/properties.hpp:96: note:
>>>>>>>>>>>>>>> T
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> boost::property_reduce<boost::vertex_predecessor_t>::apply<T>::operator()(T,
>>>>>>>>>>>>>>> T, T) const [with T =
>>>>>>>>>>>>>>> boost::detail::error_property_not_found]
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jeremiah, Do you think that I should file a bug report for
>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Although I was hoping (and actually needed quite urgently)
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> minor issue and get fixed quickly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You need to pass a predecessor map to the algorithm as one of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>
>>>>>>>>>>>>> (or put it as a part of your graph but making an external
>>>>>>>>>>>>> property
>>>>>>>>>>>>> map
>>>>>>>>>>>>> to
>>>>>>>>>>>>> give
>>>>>>>>>>>>> to the algorithm is easier).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Jeremy,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It took me a while but I finally figure out how to pass the
>>>>>>>>>>>>>
>>>>>>>>>>>>> predecessor
>>>>>>>>>>>>
>>>>>>>>>>>> map. Unfortunately it doesn't have any effect. Also, I might be
>>>>>>>>>>>>
>>>>>>>>>>>> wrong
>>>>>>>>>>>>> but I
>>>>>>>>>>>>> don't see any logical reason why predecessor map should have
>>>>>>>>>>>>> any
>>>>>>>>>>>>> bearing
>>>>>>>>>>>>> on
>>>>>>>>>>>>> the correctness of the depth. I have attached the new code. I
>>>>>>>>>>>>> am
>>>>>>>>>>>>> not
>>>>>>>>>>>>> able
>>>>>>>>>>>>> to print out the predecessor because I am not able to figure
>>>>>>>>>>>>> out
>>>>>>>>>>>>> its
>>>>>>>>>>>>> type.
>>>>>>>>>>>>> Maybe you could help me resolve this.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It would be nice to have this code running. I need to profile
>>>>>>>>>>>>> graph
>>>>>>>>>>>>>
>>>>>>>>>>>>> performance on a new machine by tomorrow. Again, thanks for you
>>>>>>>>>>>>>
>>>>>>>>>>>> patience
>>>>>>>>>>>> and
>>>>>>>>>>>> time. I really appreciate you looking into this.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Jeremiah,
>>>>>>>>>>>>
>>>>>>>>>>>> Was just wondering if you had time to look into this or any
>>>>>>>>>>>> suggestion
>>>>>>>>>>>>
>>>>>>>>>>>> on
>>>>>>>>>>>>
>>>>>>>>>>>> how further proceed. I can get you the output of predecessor
>>>>>>>>>>> map if
>>>>>>>>>>> that
>>>>>>>>>>> would help.
>>>>>>>>>>> Just that I haven't be been able to figure out what the is type
>>>>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>> property map returned by make_distributed_property_map.
>>>>>>>>>>> Please let me know your thoughts.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> The person who really knows this library is not around as far as
>>>>>>>>>>> I
>>>>>>>>>>>
>>>>>>>>>>> know;
>>>>>>>>>> he
>>>>>>>>>> might be on vacation. The type returned from
>>>>>>>>>> make_distributed_property_map
>>>>>>>>>> is documented on <URL:
>>>>>>>>>> http://www.osl.iu.edu/research/pbgl/documentation/graph/
>>>>>>>>>> distributed_property_map.html>. One thing that you could do to
>>>>>>>>>> debug
>>>>>>>>>> things is to put in your own color map and look at what values it
>>>>>>>>>> ends
>>>>>>>>>> up
>>>>>>>>>> with at the end of the algorithm, and possibly even to put in a
>>>>>>>>>> local
>>>>>>>>>> color
>>>>>>>>>> map that prints out when elements are changed. That might be a
>>>>>>>>>> lot
>>>>>>>>>> of
>>>>>>>>>> work
>>>>>>>>>> for what could end up being an obvious problem (that I just don't
>>>>>>>>>> know
>>>>>>>>>> how
>>>>>>>>>> to diagnose), though. BTW, does the example in
>>>>>>>>>> libs/graph_parallel/example/breadth_first_search.cpp work for you?
>>>>>>>>>> It
>>>>>>>>>> appears to be doing the same thing you want to do. If that works,
>>>>>>>>>> it
>>>>>>>>>> could
>>>>>>>>>> be a problem in your visitor.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I derived this code from
>>>>>>>>>> graph_parallel/example/breadth_first_search.cpp.
>>>>>>>>>>
>>>>>>>>>> The problem is that I don't understand the input and the output of
>>>>>>>>>> the
>>>>>>>>>>
>>>>>>>>>> example code. It would be great if I could get an explanation for
>>>>>>>>> the
>>>>>>>>> format of the input file. Then I can transform my graph into that
>>>>>>>>> format
>>>>>>>>> and pass to the algorithm. That would be enough for my current
>>>>>>>>> purpose.
>>>>>>>>>
>>>>>>>>> As time permits I would try out your suggestions and let know of
>>>>>>>>> update.
>>>>>>>>> In
>>>>>>>>> the meantime I am hoping that I would get input from other relevant
>>>>>>>>> authors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This is what I could tell from the code.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Input:
>>>>>>>> http://people.sc.fsu.edu/~burkardt/data/metis_graph/
>>>>>>>> <http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/>
>>>>>>>> <http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/>
>>>>>>>>
>>>>>>>> metis_graph.html<
>>>>>>>>
>>>>>>>> http://people.sc.fsu.edu/%7Eburkardt/data/metis_graph/metis_graph.html
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Output:
>>>>>>>> http://www.graphviz.org/doc/info/lang.html
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Jeremiah. The link was helpful. I understood the input graph
>>>>>>>> format.
>>>>>>>> Its for undirected graph only. However the BFS output is still
>>>>>>>> coming
>>>>>>>> out
>>>>>>>> to
>>>>>>>> be incorrect. I tried with a sample line graph of 4 nodes and it
>>>>>>>> exhibits the same problem as before.
>>>>>>>>
>>>>>>>> The distances of nodes on process 2 is not getting updated. On the
>>>>>>>> other
>>>>>>>>
>>>>>>>> hand dijkstra shortest paths example is working correctly.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sandeep
>>>>>>>
>>>>>>>
>>>>>>> I rewrote most of your code before I realized that you're outputting
>>>>>> all
>>>>>> your distances on process 0, even for vertices process 0 doesn't own.
>>>>>> The
>>>>>> large value you see is std::numeric_limits<distance_type>::max. When
>>>>>> you
>>>>>> call get(distance, vertex(i, g)) for some i not owned by the current
>>>>>> process
>>>>>> a ghost cell is created for the requested value and a request sent to
>>>>>> the
>>>>>> owner of that vertex for the correct value. get() then returns
>>>>>> whatever
>>>>>> the
>>>>>> default value is for the distance property map, which in this case is
>>>>>> infinity (i.e. std::numeric_limits<distance_type>::max.
>>>>>>
>>>>>> In some cases you may get the correct distance value because process 0
>>>>>> has
>>>>>> previously requested that value (and it may or may not be up to date).
>>>>>> If
>>>>>> process 0 has never requested that distance value and it doesn't own
>>>>>> the
>>>>>> vertex, then you'll get infinity.
>>>>>>
>>>>>> You need to gather all your data before you output it from a single
>>>>>> node
>>>>>> (i.e. issue a get() or request() which is a get with no return value,
>>>>>> for
>>>>>> each data element). If you want your code to be scalable you should
>>>>>> output
>>>>>> your data in a distributed fashion from all nodes at once. Remember
>>>>>> that
>>>>>> updated values won't be guaranteed to arrive until the next
>>>>>> synchronization
>>>>>> step.
>>>>>>
>>>>>> If you want the modified version of your code let me know and I'll
>>>>>> finish
>>>>>> it and ship it your way.
>>>>>>
>>>>>> -Nick
>>>>>>
>>>>>>
>>>>>> Hi Nick,
>>>>>>
>>>>>> Thanks so much for looking into this. Sorry, I couldn't get to it
>>>>>>
>>>>> earlier
>>>>> due to other concerns. I corrected the mistake you mentioned and I
>>>>> don't
>>>>> get
>>>>> numeric_limit<>::max() for any nodes anymore.
>>>>>
>>>>> I believe the graph/distributed/graphviz.hpp was outputting the
>>>>> distance
>>>>> in
>>>>> correct fashion.
>>>>>
>>>>> Unfortunately, the output is still coming out to be incorrect.
>>>>> It was also the case with output from graphviz.hpp.
>>>>>
>>>>> For the test graph that we mentioned before, essentially
>>>>> 0 --> 1 --> 2 --> 3,
>>>>> 0 --> 4 --> 3
>>>>>
>>>>> the distance output is as follows:
>>>>> global node-id : distance
>>>>> 0 : 0
>>>>> 1 : 1
>>>>> 2 : 2
>>>>> 3 : 1 //(wrong: should be 2)
>>>>> 4 : 0 // should be 1
>>>>>
>>>>> If possible can you send me your version of the code. Let me know if
>>>>> you
>>>>> need mine (the updated version).
>>>>>
>>>>>
>>>>
>>>> Given the description of your problem above, and assuming you're using a
>>>> block distribution with 8 vertices, it sounds like you're initializing
>>>> the
>>>> distance to the first vertex on *each* process to 0, as opposed to only
>>>> the
>>>> distance to the source. The vertex with global index 4 is actually
>>>> local
>>>> index 0 on process one using the above distribution. If you initialize
>>>> the
>>>> distances to the vertices with global indices 0 and 4 to 0, then your
>>>> results make sense. Let me know if that's not the case, but I surmise
>>>> that
>>>> it is.
>>>>
>>>> Very much so :-). Thanks for getting me started with by catching these
>>>>
>>> minor but fatal mis-understandings with PBFS code. I understood that call
>>> vertex(i,g) creates a descriptor for ith global vertex.
>>>
>>> Also there's a problem in your depth labeling visitor. In the case of
>>>
>>>> cross-processor edges the process that owns the target of the edge
>>>> doesn't
>>>> necessarily have the distance to the source of the edge available
>>>> locally.
>>>> This will lead to incorrect distance labels even though tree and
>>>> non-tree
>>>> edges will still be correctly recognized since the BFS is
>>>> level-synchronized. The fastest way to handle this is to modify the BFS
>>>> queue to also pass distance data (which is how the shortest paths
>>>> algorithms
>>>> do it). You could also send the data separately and probe for it on the
>>>> receiving side. Check out the distributed betweenness centrality code
>>>> for
>>>> how to create a distributed queue which contains tuples of data one
>>>> element
>>>> of which is a vertex (in order to know where to route the data).
>>>> Integrating this type of vertex queue (i.e. one that contains a data
>>>> payload) with BFS basically just requires handling the data payload on
>>>> the
>>>> receiving side.
>>>>
>>>> I would try this out. So basically the default call
>>>>
>>>
>>> boost::breadth_first_search
>>> (g, start,
>>> boost::visitor(boost::make_bfs_visitor(boost::record_distances(distance,
>>> boost::on_tree_edge()))));
>>>
>>> wouldn't work. I assumed that distributed BFS implementation did exactly
>>> what you mentioned.
>>> Although I would write my own, parallel visitor as well per your
>>> suggestions.
>>>
>>> thanks
>>> Sandeep
>>>
>>>
>>>
>>> Let me also add that I did modified code that initializes only the
>>>
>> distance to 0th vertex of processor 0. In this case i am facing the
>> dilemma
>> of what should I pass as start vertex for processes > 0. I can't pass the
>> global start vertex (probably the type won't be acceptable to
>> breadth_first_search.
>>
>> I will dig further into the betweeness centrality and dijkstra code to
>> figure this out. Appreciate your input.
>>
>
> The short answer is, just pass the same source vertex on all processors.
>
> Explanation: Each processor will push the source vertex on the distributed
> queue but all the non-local pushes end up being no-ops since the source
> vertex is black at the end of the first BFS level when the messages arrive.
> The source vertex just has to be graph_traits<Graph>::vertex_descriptor,
> which you can construct using vertex() call if you want to generate it from
> a global vertex index. The fact that all processors push the source vertex
> is an artifact of reusing the sequential BFS code, it could be changed but I
> haven't found it to be an issue thus far.
>
> This actually brings up an interesting feature of BFS, its possible to
> start a BFS from multiple source vertices by passing a different source on
> each processor (you can also pass a queue containing additional starting
> vertices if you need > num_procs sources). The strongly connected
> components algorithm uses this approach to run many BFSs on disconnected
> subgraphs in parallel.
>
> One last note on your visitor, my statement was incorrect, sorry. I was
> looking at some other code and confused it with yours, your depth labeling
> visitor will work fine because it 'pushes' distance labels. Basically it
> writes the distance to the vertex it pushes onto the queue into the
> distributed property map at the same time (actually immediately before).
> This means that at the next synchronization step both the distance and the
> vertex descriptor will arrive, in fact the ordering of the messages insures
> that the distance will arrive prior to the vertex on the queue. Sorry about
> that, I was looking at another visitor that was 'pulling' data, which is
> much more problematic.
>
> Hopefully that solves all your problems and my apologies again on leading
> you to believe there was something wrong with your visitor.
>
> The data model is a little tricky to wrap your head around, but once you've
> written some code it should become more intuitive.
>
>
> Hi Nick,
>
Thanks for the explanation of the distributed BFS. This is indeed quite
helpful. Especially at later stages. I haven't gotten to a point to comment
on the data model but the current architecture is swell from what I can
judge.
I hesitate to ask explanation at every minor detail but I now confused
about passing the same vertex descriptor. Specifically I didn't quite grasp
the statement "source vertex just has to be
graph_traits<Graph>::vertex_descriptor, which you can construct using
vertex() call if you want to generate it from a global vertex index".
The source vertex has to graph_traits<Graph>::vertex_descriptor which I
takes two arguments the index and the graph --these two parameters can only
identify local vertices. For global vertices we need global descriptors
which also require the owner.
Appreciate your help in clarifying this.
Thanks
Sandeep