From: matt_at_[hidden]
Date: 2004-02-15 07:40:23


I think I found a suitable trick to measure boost::function overhead in
release mode on my platform.

I'm getting 14-30 nanoseconds consistently now by forcing the result into
a volatile double:

Results are:
--------------------------------------------------------------------------------
                        looper invasive timing estimate
                               boost_function_call
--------------------------------------------------------------------------------
median time = 33.52057416267943 microseconds
90% range size = 2.514354066985641 microseconds
widest range size (max - min) = 19.96459330143541 microseconds
minimum time = 25.20813397129187 microseconds
maximum time = 45.17272727272728 microseconds
50% range = (33.51961722488039 microseconds, 33.52057416267943 microseconds)
50% range size = 0.9569377990413671 nanoseconds
--------------------------------------------------------------------------------
                        looper invasive timing estimate
                  boost_function_call_equiv_with_in_line_source
--------------------------------------------------------------------------------
median time = 33.50287081339713 microseconds
90% range size = 2.509569377990427 microseconds
widest range size (max - min) = 16.82918660287082 microseconds
minimum time = 25.18851674641148 microseconds
maximum time = 42.0177033492823 microseconds
50% range = (33.50287081339713 microseconds, 33.50382775119618 microseconds)
50% range size = 0.9569377990413671 nanoseconds
Press any key to continue . . .

Which is about an 18ns difference. From this code:

#define MAX_FN_LOOP 1e4

static double not_empty() {
        static double sum;
        static double i;

        sum = 0.0;
        for (i = 0.0; i < MAX_FN_LOOP ; ++i) {
                sum += i * i;
        }
        return sum;
}

inline double boost_function_call( matt::timer& t )
{
        boost::function< double (void)> fn = ¬_empty;

        double now;
        volatile double x = 0;
        t.restart();
        x += fn();

        now = t.elapsed();

        return now;
}

inline double boost_function_call_equiv_with_in_line_source( matt::timer& t )
{

        double now;
        volatile double x;
        t.restart();

        static double sum;
        static double i;

        sum = 0.0;
        for (i = 0.0; i < MAX_FN_LOOP ; ++i) {
                sum += i * i;
        }

        x = sum;
        now = t.elapsed();

        return now;
}

If I change the size of the loop to something much smaller, say 1e1, I get
a reasonably consistent result now:

--------------------------------------------------------------------------------
                        looper invasive timing estimate
                               boost_function_call
--------------------------------------------------------------------------------
median time = 60.28708133971293 nanoseconds
90% range size = 0.9569377990430612 nanoseconds
widest range size (max - min) = 2.90622009569378 microseconds
minimum time = 60.28708133971293 nanoseconds
maximum time = 2.966507177033494 microseconds
50% range = (60.28708133971293 nanoseconds, 61.244019138756 nanoseconds)
50% range size = 0.9569377990430612 nanoseconds
--------------------------------------------------------------------------------
                        looper invasive timing estimate
                  boost_function_call_equiv_with_in_line_source
--------------------------------------------------------------------------------
median time = 43.54066985645935 nanoseconds
90% range size = nanoseconds
widest range size (max - min) = 34.92822966507178 nanoseconds
minimum time = 43.54066985645935 nanoseconds
maximum time = 78.46889952153111 nanoseconds
50% range = (43.54066985645935 nanoseconds, 43.54066985645935 nanoseconds)
50% range size = nanoseconds
Press any key to continue . . .

About a 17ns difference. Pretty consistent with the previous measurement
even though the function workload is a couple of orders of magnitude
different.

I think my quoteable message for Doug would read:

<message>
The cost of boost::function can be reasonably consitently measured at
around 20ns +/- 10 ns on a modern >2GHz platform versus directly inlining
the code.

However, the performance of your application my benefit from or be
disadvantaged by boost::function depending on how your C++ optimiser
optimises. Similar to a standard function pointer, differences of order
of 10% have been noted to the _benefit_ or _disadvantage_ of using
boost::function to call a function that contains a tight loop depending on
your compilation circumstances.
</message>

HTH...

Which is where I'll leave it. I think I'm satisfied with my lack of
understanding of this trivial trivia now.

Regards,

Matt Hurd.