$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Nathan Myers (ncm-yahoospam_at_[hidden])
Date: 2002-01-01 00:19:52
joel de guzman wrote:
> From: "Nathan Myers" :
> > Joel's implementation ... uses a sorted
> > vector of ranges, rather than a map indexed on the first element. As a
> > result, insert and erase operations involve block moves, but memory usage
> > is tight, where in extent_set<> everything involves tree operations.
> > Joel's uses reference counting, with copy-on-write to enable efficient
> > copy/assignment.
> >
> > The important differences are more subtle. Joel's interface emphasizes
> > construction, copying, combining and membership tests; extent_set<>
> > emphasizes inserting and erasing ranges, and scanning the ranges present.
> > ...
> The actual implementation has two layers. You forgot to mention
> the lower layer which is the actual range_run and range classes.
> The range_run class (which is the real engine) has only a few
> member functions. Most importantly, 1) test, 2) set 3) clear.
> test tests for membership. set and clear, ummm, sets and clears
> a range. It also exposes the range iterators. This class is not ref-
> counted. This class is standalone and is not tied to any parser code
> at all. I intend this to be the basis of a more generalized submission.
Thank you, Joel, for your patience with me.
I hadn't thought of them as separable components; I got thrown off
by the reference counting. The range_run<> type is indeed very similar
to extent_set<>, and more complete.
> I may be wrong, but based on what I saw initially, extent_set does
> not allow insertion and removal of overlapping ranges. The test
> program does not perform any. For instance, is it possible to
> do something like this?
>
> s.set(range(0, 100)); // result = [0..100]
> s.add(range(200, 300)); // result = [0..100] [200,300]
> s.set(range(50, 150)); // result = [0..300]
>
> or given the current state: [0..10] [20..30] [40..50] [60..70]
>
> s.set(range(0, 100)); // result = [0..100]
>
> and this?
>
> set.set(range(min_int, max_int)); // min_int..max_int
> set.clear(range(-100,100)); // [min_int..-100] [100..max_int]
Such functionality would be easy to add to extent_set<>, but would involve
a loop to implement. Maybe the loop would only run one cycle or be skipped
entirely in the non-overlapping case. However, it would not be quite so
elegant :-).
>> The preconditions radically simplify the logic of the insert and erase
>> members. They correspond, logically, to the requirement the same memory
>> not be passed to free() twice. More general functions that don't impose
>> such preconditions must contain a loop. They would be easy to add (along
>> with general set union, intersection, and complement-intersection), and I
>> expect that will happen eventually, but they would not be a substitute for
>> the more basic operations.
>
> If I read this correctly, Spirit's low level range_run class might be the
> eventual result you are looking for?
>
> BTW, your other concerns e.g. less-than < vs. generic compare is of
> course trivial to implement. Finally, I am still not sure if a map is a good
> choice, performance wise, nevertheless, a higher level class might
> use a policy to choose between the vector/map implementation.
I agree, we could merge range_set and extent_set with probably minimal
upset to either. A policy argument for whether to use a vector or a map
to represent it, to trade off space against update speed, also seems
workable, in principle.
An important difference is that with a tree representation, incrementing
or decrementing the iterator can be pretty expensive, so the algorithms
need to be careful not to do it unnecessarily. (That is why extent_set<>
stores its extents in reverse order; map<>::lower_bound gives the right
answer, then.)
A survey...
Judging from the postings by Darin, Howard, and Dave, this represents one
of a family of very similar interesting structures. What is common to
them all is not so much their range interface, but rather that they
interpolate coverage of their domain and/or range. They store one
or both endpoints depending on whether they are continuous, and use
a slope of one or zero. (Dave suggested allowing other slopes; this
would allow mapping from, e.g., block number to byte offset. He also
mentioned bidirectional mapping. That might be hard to shoehorn into
a common structure.)
Darin's IncrementalRangeMap, my extent_map<>, and Howard's range_map<>
are almost the same.
Darin's RangeMap differs from the other examples in having a 0 slope; it
could tolerate a non-numeric type in its, er, range. RangeSet, extent_set,
and range_run have what amounts to a boolean, um, range. In all four
cases arithmetic on the domain type is limited to, at most, incrementing
or decrementing.
Treatment of argument domains that overlap existing mappings varies from
"Precondition: not", with elegantly minimal operator code, to "anything
goes". I see no reason not to have both interfaces; the difference is
usually more a matter of preference -- minimalism vs. generality -- than
of semantics or performance.
(BTW, I find the "range" term difficult to use around mapping primitives,
given the existing domain/range usage in the same context. That's another
part of why I have preferred to say "extent".)
It's not clear to me whether all these should be unified under a policy
parameter, or just presented as a family of components. Probably the
latter, although they might be able to share some implementation code.
in the way that std::set<> and std::map<> do.
Dave's cumulative extent tree, a 1D case of Beman's RTree and
Reid's 3D structures, is another productive area to explore.
Happy New Year to all,
Nathan Myers
ncm at cantrip dot org