From: Doug Gregor (dgregor_at_[hidden])
Date: 2004-12-02 11:25:50


On Dec 2, 2004, at 9:32 AM, Robert Zeh wrote:
> At home I decided to try replacing the shared_ptr in
> slot_call_iterator with an instance variable of the result_type.
> Since slot_call_iterator uses the shared_ptr to determine if the cache
> is valid, I had to add a bool indicating if the cached value is valid
> or not. These changes made the benchmark run about two to four times
> faster (on my home system, a 650 MHz Duron with gcc 3.3.2 running
> Debian). I expect better results with my SPARC machine, because the
> malloc implementation seems to be slower.

Interesting! I never would have expected that to be such a bottleneck.

> There are several drawbacks to my modifications. It's harder to
> maintain because of the added bool. The result_type used by
> slot_call_iterator must now have a default constructor. If it is
> expensive to copy the result_type, and slot_call_iterator is copied a
> lot, replacing the shared_ptr with an instance variable will actually
>
> make things slower.
> I don't know enough about the internals to weigh how important these
> issues are.

Unfortunately, some of them are pretty important. For instance, if one
tried to have a result_type that is an iterator, copying a
slot_call_iterator that has not been dereferenced could result in
undefined behavior, because the iterator would have been
default-constructed and therefore singular. That doesn't make the
optimization invalid, of course, but it does limit where we can apply
the optimization.

> I believe the correct way to do things is to create a cache interface
> that encapsulates the behavior the slot_call_iterator needs, and to
> then choose the appropriate cache implementation at runtime using the
> mpl.

We could do it at compile time using some combination of type traits,
probably. Unfortunately, those that come to
mind--is_default_constructible and is_assignable--don't give us all the
information we need, because iterators will return "true" for both but
will have the wrong behavior. We would need to introduce a new trait
for this.

Actually, there is one other corner case that's really ugly. The
combiner could theoretically do something like this:

template<typename InputIterator>
result_type operator()(InputIterator first, InputIterator last)
{
   while (first != last) {
     InputIterator annoying = first;
     *first;
     if (*annoying) /* ... */
   }
}

I *think* that's valid code according to the Input Iterator
requirements (but not 100% sure!), and without the pointer "*annoying"
would call the slot again...

Oh, I know a solution now... we stick an "optional<R> slot_result" into
the signal's operator() and keep a pointer to that variable in the
slot_call_iterator. The optional<R> has nearly the same effect as the
cache interface you mentioned, but it doesn't require
default-constructibility; storing a pointer to it ensures that
slot_iterator copies are fast and that all slot_iterators see the
updates (for those truly pathological combiners).

I'm really interested in this thread but, as seems to happen, it's come
at a time when I can't dedicate a whole lot of time to it. After
December 10th, I'm all over it :) The timing tests, "lite" versions,
performance results and performance analyses are greatly appreciated...
with any luck we'll be able to improve Signals performance or come up
with its replacement :)

On the technical side, I expect the the orders-of-magnitude speed lag
in Signals comes from the named_slot_map. Jody's "lite" version does
not support named slots. Perhaps that feature is too painful to have,
and that we should just drop back to using an std::list as Jody and
perhaps provide an alternative signal type that allows named slots.

        Doug