$include_dir="/home/hyper-archives/geometry/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [geometry] How can I Help?
From: Adam Wulkiewicz (adam.wulkiewicz_at_[hidden])
Date: 2014-11-26 11:48:44
Hi Henry,
Henry Roeland wrote:
> Dear all,
>
> First let me introduce myself: my name is Henry Roeland 37 years old and currently working as software engineer for a company that delivers maps to car companies like BMW and VW.
> I'm working in the testtools team that is responsible for a map viewer and maptest batch tool.
> Technics that we use are C/C++, Sqlite, spatialite and of corse boost.
>
> Personally I have multiple years of C++ experience and busy getting up to speed with geometry technics in 2D. NeXT to that just started with studying advanced datastructures and algoritms like b+tree and r*tree.
>
> Currently I see your rtree as a perfect candidate for in memory container/index/cache in our map interface. As I wrote in my previous email the feature that is not supported yet is paging. How can I help to accomplish this feature? Are there specs? Or am I the only one in favior of this feature?
Certainly not the only one. People are asking about it from time to 
time, using Interprocess' mapped file as a replacement. You're the only 
one who is willing to help with the implementation. I greately 
appreciate it.
As I mentioned before I was thinking about a stateful 
allocator/storage/manager handling file or files/pages, cacheing etc. 
This allocator would return pointers which when dereferenced would force 
the allocator to load and cache data. Then, there would be a mechanism 
to tell the allocator that some pointers/data aren't used anymore. It 
could be done explicitly by calling some function on allocator like:
persistent_allocator alloc(...);
rtree<..., persistent_allocator> rt(..., alloc);
rt.insert(...);
rt.insert(...);
rt.insert(...);
alloc.flush();
The next step could be adding of some hooks or traits for allocators 
which would be called by the rtree internally on the end of some 
operation, e.g. insert. So this wouldn't be a C++ allocator anymore 
since it'd have additional methods.
To avoid storing a reference to allocator/storage in those smart 
pointers the rtree could notify the allocator/storage when a pointer is 
dereferenced and data is required. So the responsibility for the 
notification of allocator/storage would be moved from a pointer to the 
rtree.
Finally the rtree could notify the allocator/storage each time the data 
owned by a pointer is no longer needed but this probably wouldn't change 
anything. Furthermore modifying and releasing the nodes before an 
operation is finished wouldn't be safe in case when an exception was 
thrown. In fact, this allocator should keep a copy of nodes in memory 
during a whole operation and then at the end of the operation save the 
data into a persistent storage.
The next thing is how to serialize data from and to node. We should ask 
ourselves should we allow the users to implement other persistent 
storage variants on their own. If the answer was no, then we could just 
internally in the persistent allocator use the internals of the rtree. 
But if we wanted to allow it (which IMO should be done) the most 
straightforward way would be to expose the interface used by the rtree 
internally to access node's elements. Exposing it would mean that we 
couldn't change it in the future, so I'd prefer to avoid it. Instead, we 
could mimic or directly use Boost.Serialization interface for Nodes. In 
this scenario a specific Node would know what data must be saved and 
loaded and it'd be able to save or load to/from some generic Archive. 
Depending on the needs this could be some temporary 1-node Archive 
gathering data for later or a target Archive capable to serialize data 
on the fly, it'd depend on an allocator/storage.
This way we could also support versioning on a node level, the same way 
how it's done in Serialization. So changes would have to be done on a 
nodes level not in the user-defined allocator. An example could be an 
implementation of a rtree variant storing additional data in nodes 
(hilbert r-tree) or additional types of nodes (PRtree). Also arbitrary 
user-defined Values would be serialized the same way (using 
Serialization or familiar Serialization-like interface).
And this way we'd also support Serialization in one go.
We'd probably need some file header with the parameters of an rtree both 
in persistent storage and in serialization support. Similar as with 
nodes, an rtree could know how to load/save the header from/to some 
Archive. The rtree should e.g. check if it's possible to use persistent 
storage and load data, e.g. if it wasn't created using incompatible 
parameters, etc.
So this is how I see it. Of course it isn't a plan, rather an idea. Feel 
free to point out any errors or present your ideas.
> I started studying your code and busy generating uml diagram of your code to get the big picture.
Ok, if you have any questions feel free to ask. Maybe I'll do some 
introduction about the internals.
1. Nodes
The rtree handles nodes using visitor pattern. It's more or less how 
Boost.Variant and static_visitor works.
Currently there are 2 types of nodes, internal nodes and leafs. Internal 
nodes store a container of Box, pointer pairs, leafs store Values of 
type passed as the 1st rtree template parameter.
Furthermore nodes can store static-size or dynamic-size containers. The 
kind is identified by a tag.
For each kind there is a template allocators<> storing allocator objects 
needed to construct internal nodes and leafs.
There are also specializations of utilities create_node<> and 
destroy_node<> implementing creatin and destruction of nodes.
Everything related to nodes is here: 
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/node
2. Algorithms
Nearly all operations are implemented using visitor pattern: insertion, 
removal, query, copy, destruction 
(https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/visitors).
The simplest one is a visitor checking if a node is internal node or a 
leaf: 
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/visitors/is_leaf.hpp
The most complex operation is insertion.
3. Insertion
bgi::detail::rtree::visitors::insert<> allows to insert a Value or an 
Element on a desired level of a tree (there are 2 specializations). See:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/visitors/insert.hpp
line 403 and below.
The flow of an original R-tree balancing algorithm was slightly changed 
to allow nice decomposition of functional parts.
At each level of traversal:
1. For internal node
   1.1. bgi::detail::rtree::choose_next_node<> algorithm is called 
(default implementation in line 25 of visitors/insert.hpp)
   1.2. the tree is traversed using the choosen node
2. For leaf a new Value is added
3. If there is an overflow (too many elements)
   3.1 a bgi::detail::rtree::split<> algorithm is called (default in 
line 109 of visitors/insert.hpp)
   3.2 split algo creates a new node and...
   3.3 bgi::detail::rtree::redistribute_elements<> is called, it 
redistrbutes contained elements between nodes
Depending on parameters various algorithms are used, each rtree type may 
specialize its own algorithms.
In fact all rtree variants define different redistribute_elements<>.
linear and quadratic rtrees uses the default insert<> and 
choose_next_node<>.
R*tree specializes also choose_next_node<> and insert<>.
Each variant has its own directory:
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/linear
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/quadratic
https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/index/detail/rtree/rstar
4. Parameters and options
All of the above, nodes and algorithms are identified by tags (using 
tag-dispatching technique).
Binding of rtree parameters with tags is done in 
bgi::detail::rtree::options<> and can be seen here:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/options.hpp
5. Pack create
Plus there is also 1 packing algorithm creating the rtree in a top-down 
manner here:
https://github.com/boostorg/geometry/blob/develop/include/boost/geometry/index/detail/rtree/pack_create.hpp
It works more or less like a classic top-down kd-tree creation algorithm 
using object median split.
> One remark is that I have to help in my own/spare time.
Sure, no pressure. I'm working on the rtree in my free time as well and 
I appreciate any help :)
Regards,
Adam