$include_dir="/home/hyper-archives/geometry/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [geometry] 1.57 Storing indices as values to save space.
From: Tim Finer (tim.finer_at_[hidden])
Date: 2014-11-26 18:56:51
Hi Adam,
On 11/26/14, 11:49 AM, Adam Wulkiewicz wrote:
> Hi Tim,
>
> Tim Finer wrote:
>> Hi Adam,
>>
>> Actually, I think what rtree is doing with the temporary vector in 
>> the packing constructor should be considered a bug.
>>
>> That problem is that the vector is created without regard to the 
>> rtree's allocator - so even if a user gives the rtree a memory mapped 
>> file allocator, the rtree creates a copy of the incoming data in 
>> memory, ignoring the allocator.
>>
>
> Not really a copy of incoming data. A calculated centroids and 
> iterators to the original data.
Ah I see.  IndexableGetter must be the source for the centroid though, 
right?  Can you use the IndexableGetter to calculate and sort on the 
centroid without allocating a temporary vector of them?
>
> Yes, rtree's allocator could be passed to the vector (as mentioned in 
> the other email). Then iterators shouldn't be stored since they might 
> be not suitable for storing in shared memory. So probably a copy of 
> actual objects should be made here. And this could be done only if an 
> allocator was different than std::allocator<> so at least in this case 
> it wouldn't harm the performance. Also in the case of passing 
> RandomAccessIterators elements indexes could be stored instead of 
> iterators.
Except I think the word "should" belongs there instead of "could". :)  
The whole point of passing in an allocator is for naught if the class 
selectively uses it - it doesn't matter about shared memory allocators 
or not.  All of the classes that I'm familiar with that use an allocator 
use it completely (like the stdlib containers).
Step back from your knowledge of the implementation and think about this 
as a user of the rtree.  Would you really expect the rtree to allocate 
space from the heap, after giving it an allocator?
The rtree shouldn't second guess the intentions of the users here and 
allocate memory from the heap for a temporary structure.  There are 
plenty of important uses cases to use other kinds of allocators that 
don't have anything to do with shared or mapped memory: debugging 
allocators, small object allocators, memory pools, etc. What makes this 
a bug is that it is a surprise.  It probably would have gone unnoticed 
by me but for the operating system killing my testbed.  :)
The need for this heap allocated vector appears to be packing related, 
would it make sense to expose the packing algorithm as a free function 
like the std::transform in <algorithm> instead? It could even take a 
callable object, just like transform does (it'd be nice to reuse the 
IndexableGetter there, if that's what it needs).
rtree::pack( srcBegin, srcEnd, packedBegin, packedEnd );
where srcBegin and srcEnd are random access, const_iterators and
std::distance(srcBegin,srcEnd) == std::distance(packedBegin, packedEnd).
Or an output iterator:
rtree::pack<Type, Type, Type>( srcBegin, srcEnd, packedIter );
Then the user would be responsible for allocating the packed container 
and the resultant packing what would be passed in as a ctor (or a 
factory method that has some sort of name that says "packed"):
auto rtree = 
bgi::rtree<ValueType,Partitioner,Indexer>make_packed_rtree(packedBegin, 
packedEnd);
// Or, the constructor
auto rtree = bgi::rtree<ValueType,Partitioner,Indexer>(packedBegin, 
packedEnd);
That gives the flexibility of allocation without compromising the API.  
I think it is cleaner to make a bulk loaded rtree construction more 
explicit (vs. insert) and it removes the "surprise" of allocation that 
packing requires.
>
> But would it solve your problem?
Yes.  My test of 50,000,000 points died because it ran out of memory 
(this is a "small" test).  It was a big surprise to see that the rtree 
needed to allocate outside of the allocator and not use the functor to 
derive what it needed.  I'm not focused on persistence, that was a means 
to an end.  My overall goal is to give my users the query ability of 
data too large to fit in memory.  If the rtree only stored the ValueType 
as implied in the documentation and only used the allocater, as implied 
by the API...
>
> Alterntively a second allocator could be passed for this temporary 
> container, so a second, temporary file could be used. This would 
> probably be the most convenient for you, am I right? However this 
> would require to complicate the interface (constructors) so I don't 
> like it.
I agree that putting yet another allocator in the constructor makes it 
even more complex than it should be.  Does the rtree really need to keep 
the centroids of all of the data while it is packing? Couldn't those be 
derived from calling the passed in IndexableGetter?  If not, then maybe 
getting the centroid is fundamental enough to deserve its own functor, 
or a some other way to model that supports indexing, finding the 
centroid and equality?
If the packing algorithm really needs the temporary storage, then I 
think that the constructor is doing too much and the functionality that 
is packing should be made a free function and the rtree's constructor 
made simpler.
>
> We should definietly think about it when implementing a "proper" 
> persistance support.
Yeah, persistence support is another thing entirely.  I'm talking about 
the happy case of memory based allocation right now - rtree is not 
honoring its API contract with regard to allocators.  The harder problem 
of persistence is another story, but yes, that would be grand to have 
all that too!
I'll take a look to see if I can't figure out if any of my ideas make 
any sense.  I don't know the domain well, but I do have a lot of C++ 
experience.  I'll let you know either way.
Best,
Tim