$include_dir="/home/hyper-archives/geometry/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [geometry] R-Tree Nearest Neighbors Unique Member From Group
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2018-05-16 17:34:31
Hi,
Jpb Via Geometry wrote:
> Hi Guys,
>
> I have an rtree where each item in the tree has a group ( Colors perhaps:
> red, green, blue, orange...)
> I want to do a nearest neighbor search where the search returns the nearest
> k-neighbors with at most
> one object from each group.
>
> For example, if we have
>
> Object A: Red,    distance 1
> Object B: Red,    distance 2
> Object C: Blue,   distance 3
> Object D: Green, distance 4
>
> and I do a nearest neighbor at most one from each group search with k = 2
> then
> I get object A and C. B is skipped since A is closer of the same group.
>
> It looks like I could get this done by creating a specialization of
> distance_query_result class and have the distance_query_result::store method
> to check the groups/distances of already stored items and act accordingly.
> Then create a specialization of distance query and maybe query_dispatch to
> get distance_query to use the new distance_query_result.
>
> Is there a better way to go about this?
No, there is no "official" (i.e. exposed in the interface) way of doing 
it besides calling query N times for each group/category (using 
nearest() && satisfies()) and after that merging the results. So your 
approach is more optimal than that however it relies on the internals of 
the R-tree so have in mind that they may be changed some time in the future.
I'd suggest keeping N stacks (neighbors) one for each group and track 
them separately.
You also have to somehow pass the number of groups into the 
distance_query_result.
Furthermore you have to implement 
distance_query_result::greatest_comparable_distance() and return the 
greatest distance from all groups but only if all heaps contain at least 
one element each.
But I'm wondering, are the elements from all groups distributed more or 
less in uniform fashion? I'm asking because otherwise the query may have 
to traverse big chunks of the tree before finding at least one element 
from each group. Consider the situation where there is a clear 
separation between groups and groups are spatially far from each other. 
In this case my guess is that it'd be better to have separate R-tree for 
each group, perform knn query on all of them and merge the results.
An alternative to the solution above (for groups spatially far from each 
other) would be to have one R-tree but to store in the nodes the 
information about the groups of objects sotred in underlying nodes and 
custom query visitor taking into account currently stored neaighbors and 
groups covered by analysed nodes. So the idea is that the pruning of 
nodes would be done both using the distance to the furthest neighbor 
already found as well as groups of neighbors. This would not require to 
merge the result at the end since the query woudl handle everything. But 
it would require even more changes to the R-tree including introducing 
new nodes (containing info about underlying groups), insertion visitor 
(gathering the info about the groups from children and passing it to the 
newly created node) and packing algorithm (if needed, the same as 
insertion).
Regards,
Adam