Boost mailing page: RE: [boost] boost::bind is excellent

Date view	Thread view	Subject view	Author view

From: Steve Anichini (sanichin_at_[hidden])
Date: 2001-11-06 22:21:29

Next message: darylew_at_[hidden]: "About dlw_gcd.zip 11"
Previous message: Dietmar Kuehl: "Re: [boost] The one thing I like better about <boost/array_traits.hpp> over <boost/array.hpp>"
In reply to: Greg Colvin: "Re: [boost] boost::bind is excellent"
Next in thread: Peter Dimov: "Re: [boost] boost::bind is excellent"

> -----Original Message-----
> From: Greg Colvin [mailto:gcolvin_at_[hidden]]
> [...]
>
> > ... I've got a question: what about performance of using with standard
> > algorithms, how much is differ from hand made code?
> >
> > Vladimir
>
> Good question. The only way to answer it for sure is to write
> your code with and without bind and compare the results.
>

Funny you should mention that, as a friend asked me the same thing so I
decided to take a look at the assembly output of boost::bind vs calling a
function directly through a function pointer on two compilers.

Note that a) this doesn't really give you any indication of real
performance, as its a contrived example and b) because the results just
between two compilers vary widely, it's going to depend a lot on how good
your compiler is. Because of a) and b) the best answer is Greg's, but I felt
this might be somewhat illuminating.

I created a contrived example in order to compare the compiler output. It
follows:

// begin code
#include "boost/bind.hpp"
#include <stdio.h>

class Foo
{
public:
Foo() {}
int call2(int a1, char a2) { printf("Call 2: (%d, %c)\n", a1, a2);
return 0; }
};

void call1()
{
        Foo foo;
        int x = 5;
        boost::bind(Foo::call2, &foo, _1, 'c')(x);
}

void call2()
{
        Foo foo;
        int (Foo::*pFoo)(int, char) = Foo::call2;
        int x = 5;
        ((&foo)->*pFoo)(x, 'c');
}

int main(int argc, char* argv[])
{
        call1();
        call2();
        return 0;
}
// end code

The generated assembly on MSVC 6.0 (cl.exe 12.00.8804) with "Maximize speed"
optimization settings + Pentium Pro as processor + debug info+inline any
appropriate+C++ exceptions+RTTI turned on:
34: int main(int argc, char* argv[])
35: {
00401060 sub esp,0Ch
36: call1();
00401063 lea ecx,[esp]
00401067 lea eax,[esp+8]
0040106B push ecx
0040106C lea ecx,[esp+8]
00401070 mov dword ptr [esp+4],eax
00401074 call boost::_bi::value<Foo *>::value<Foo *> (004010a0)
00401079 mov ecx,dword ptr [esp+4]
0040107D push 63h
0040107F push 5
00401081 call Foo::call2 (00401030)
37: call2();
00401086 push 63h
00401088 push 5
0040108A lea ecx,[esp+10h]
0040108E call Foo::call2 (00401030)
38: return 0;
00401093 xor eax,eax
39: }
00401095 add esp,0Ch
00401098 ret
41: value(T const & t): t_(t) {}
004010A0 mov eax,ecx
004010A2 mov ecx,dword ptr [esp+4]
004010A6 mov edx,dword ptr [ecx]
004010A8 mov dword ptr [eax],edx
004010AA ret 4

So VC was able to strip out/inline a lot of the temporaries and constructor
calls. The only one remaining is the constructor to value<Foo *>::value<Foo
*>.

The assembly output of Metrowerks CodeWarrior 7 for Windows, Pentium Pro as
processor, smart inlining, auto-inline, global optimizations level 4, pool
strings, C++ Exceptions, RTTI turned on:

Function: _main

; 35: {

; 36: call1();

00000: 0000 55 PUSH EBP
00008: 0001 8A 15 00000000 MOV DL, BYTE PTR .bss+00000000
00000: 0007 89 E5 MOV EBP, ESP
00000: 0009 56 PUSH ESI
00000: 000A 57 PUSH EDI
00000: 000B 83 EC 40 SUB ESP, 00000040
00008: 000E 8D 45 FFFFFFD7 LEA EAX, DWORD PTR FFFFFFD7[EBP]
00008: 0011 6A 63 PUSH 00000063
00008: 0013 52 PUSH EDX
00008: 0014 50 PUSH EAX
00008: 0015 8D 45 FFFFFFB8 LEA EAX, DWORD PTR FFFFFFB8[EBP]
00008: 0018 FF 35 00000008 PUSH DWORD PTR __at_192+00000008
00008: 001E FF 35 00000004 PUSH DWORD PTR __at_192+00000004
00008: 0024 FF 35 00000000 PUSH DWORD PTR __at_192
00008: 002A 50 PUSH EAX
00008: 002B E8 00000000 CALL SHORT
?bind@?$$HVFoo@@HDPAV1_at_V?$arg@$00@_bi_at_boost@@D_at_4@YA?AV?$bind_t_at_HV?$mf2_at_HVFoo
@@HD@_mfi_at_boost@@V?$list3_at_V?$value_at_PAVFoo@@@_bi_at_boost@@V?$arg@$00_at_23@V?$valu
e_at_D@23@@_bi_at_3@@34_at_P81@AEHHD_at_ZPAV1@V234_at_D@Z
00008: 0030 8D 55 FFFFFFB8 LEA EDX, DWORD PTR FFFFFFB8[EBP]
00008: 0033 8D 7D FFFFFFE8 LEA EDI, DWORD PTR FFFFFFE8[EBP]
00008: 0036 8D 32 LEA ESI, DWORD PTR 00000000[EDX]
00008: 0038 83 C4 1C ADD ESP, 0000001C
00008: 003B A5 MOVSD
00008: 003C A5 MOVSD
00008: 003D A5 MOVSD
00008: 003E A5 MOVSD
00008: 003F 8A 42 15 MOV AL, BYTE PTR 00000015[EDX]
00008: 0042 8B 4A 10 MOV ECX, DWORD PTR 00000010[EDX]
00008: 0045 50 PUSH EAX
00008: 0046 8D 45 FFFFFFE8 LEA EAX, DWORD PTR FFFFFFE8[EBP]
00008: 0049 6A 05 PUSH 00000005
00008: 004B 50 PUSH EAX
00008: 004C E8 00000000 CALL SHORT ___ptmf_scall, 0000000C

; 37: call2();

00008: 0051 BE 00000000 MOV ESI, OFFSET __at_230
00008: 0056 8D 7D FFFFFFD8 LEA EDI, DWORD PTR FFFFFFD8[EBP]
00008: 0059 A5 MOVSD
00008: 005A 6A 63 PUSH 00000063
00008: 005C 6A 05 PUSH 00000005
00008: 005E A5 MOVSD
00008: 005F A5 MOVSD
00008: 0060 8D 45 FFFFFFD8 LEA EAX, DWORD PTR FFFFFFD8[EBP]
00008: 0063 8D 4D FFFFFFE7 LEA ECX, DWORD PTR FFFFFFE7[EBP]
00008: 0066 50 PUSH EAX
00008: 0067 E8 00000000 CALL SHORT ___ptmf_scall, 0000000C

; 38: return 0;

00008: 006C 31 C0 XOR EAX, EAX
00000: 006E L0000:
00000: 006E 8D 65 FFFFFFF8 LEA ESP, DWORD PTR FFFFFFF8[EBP]
00000: 0071 5F POP EDI
00000: 0072 5E POP ESI
00000: 0073 5D POP EBP
00000: 0074 C3 RETN

Function:
?bind@?$$HVFoo@@HDPAV1_at_V?$arg@$00@_bi_at_boost@@D_at_4@YA?AV?$bind_t_at_HV?$mf2_at_HVFoo
@@HD@_mfi_at_boost@@V?$list3_at_V?$value_at_PAVFoo@@@_bi_at_boost@@V?$arg@$00_at_23@V?$valu
e_at_D@23@@_bi_at_3@@34_at_P81@AEHHD_at_ZPAV1@V234_at_D@Z

; 1079: {

; 1082: return _bi::bind_t<R, F, list_type>(F(f), list_type(a1, a2,
a3));

00000: 0000 55 PUSH EBP
00000: 0001 89 E5 MOV EBP, ESP
00000: 0003 56 PUSH ESI
00000: 0004 57 PUSH EDI
00000: 0005 83 EC 3C SUB ESP, 0000003C
00000: 0008 83 E4 FFFFFFF8 AND ESP, FFFFFFF8
00008: 000B 8A 55 20 MOV DL, BYTE PTR 00000020[EBP]
00008: 000E 8B 75 18 MOV ESI, DWORD PTR 00000018[EBP]
00000: 0011 8B 45 08 MOV EAX, DWORD PTR 00000008[EBP]
00008: 0014 89 F7 MOV EDI, ESI
00008: 0016 88 54 24 1F MOV BYTE PTR 0000001F[ESP], DL
00008: 001A 8A 4C 24 1F MOV CL, BYTE PTR 0000001F[ESP]
00008: 001E 89 74 24 18 MOV DWORD PTR 00000018[ESP], ESI
00008: 0022 8D 75 0C LEA ESI, DWORD PTR 0000000C[EBP]
00008: 0025 88 4C 24 37 MOV BYTE PTR 00000037[ESP], CL
00008: 0029 8A 4C 24 37 MOV CL, BYTE PTR 00000037[ESP]
00008: 002D 89 7C 24 30 MOV DWORD PTR 00000030[ESP], EDI
00008: 0031 8D 3C 24 LEA EDI, DWORD PTR 00000000[ESP]
00008: 0034 8A 55 1C MOV DL, BYTE PTR 0000001C[EBP]
00008: 0037 88 4C 24 15 MOV BYTE PTR 00000015[ESP], CL
00008: 003B 88 54 24 14 MOV BYTE PTR 00000014[ESP], DL
00008: 003F A5 MOVSD
00008: 0040 A5 MOVSD
00008: 0041 A5 MOVSD
00008: 0042 8D 34 24 LEA ESI, DWORD PTR 00000000[ESP]
00008: 0045 8D 7C 24 20 LEA EDI, DWORD PTR 00000020[ESP]
00008: 0049 A5 MOVSD
00008: 004A A5 MOVSD
00008: 004B A5 MOVSD
00008: 004C A5 MOVSD
00008: 004D 8D 74 24 20 LEA ESI, DWORD PTR 00000020[ESP]
00008: 0051 8D 38 LEA EDI, DWORD PTR 00000000[EAX]
00008: 0053 A5 MOVSD
00008: 0054 A5 MOVSD
00008: 0055 A5 MOVSD
00008: 0056 8B 74 24 30 MOV ESI, DWORD PTR 00000030[ESP]
00008: 005A 89 70 10 MOV DWORD PTR 00000010[EAX], ESI
00008: 005D 8A 54 24 14 MOV DL, BYTE PTR 00000014[ESP]
00008: 0061 88 50 14 MOV BYTE PTR 00000014[EAX], DL
00008: 0064 8A 4C 24 15 MOV CL, BYTE PTR 00000015[ESP]
00008: 0068 88 48 15 MOV BYTE PTR 00000015[EAX], CL
00000: 006B L0000:
00000: 006B 8D 65 FFFFFFF8 LEA ESP, DWORD PTR FFFFFFF8[EBP]
00000: 006E 5F POP EDI
00000: 006F 5E POP ESI
00000: 0070 5D POP EBP
00000: 0071 C3 RETN

As you can see, CodeWarrior generates a lot more code for the bind version!
For example, the bind_t<> constructor is not inlined and looks like it does
a lot more.

What does this mean? Not a whole lot, unfortunately. "There are lies, damn
lies, and benchmarks". This is just an example of assembly generated for a
specific contrived case on two specific compilers. I don't know if we can
extrapolate a whole lot from this specific case. Still, it's interesting to
look at. I'd be interested in seeing what other compilers generate on x86
for the same code.

One thing I note, that while I have not looked at generated assembly, my gut
feeling is doing something like this

struct FooCall2 : public std::unary_function<int, char>
{
public:
FooCall2(Foo *pFoo, int bindX) : mpFoo(pFoo), mBindY(bindX) {}

        int operator()(char c) const
        {
                return mpFoo->call2(mBindX, c); }
        }
protected:
        Foo *mpFoo;
        int mBindX;
};

// ...
FooCall2 (&foo, 5)('x');

will usually generate better code than a boost::bind() call that
accomplishes the same thing:

boost::bind(Foo::call2, &foo, 5, _1)('x');

My rationale is the former doesn't use any function pointers. Function
pointers can prevent compilers from inlining calls.

But as we all know, efficiency isn't everything. boost::bind offers much
more flexibility that the former approach - for example, what if we needed
to change the bound parameter to the character and leave the integer
argument unbound? That would require a different version of FooCall2,
whereas with bind it's changing a couple arguments. That kind of elegance
and coding efficiency is more than worth any potential run-time penalty. And
given the 80/20 rule, most of the time the run-time penalty will not affect
overall performance of the program. For the 20% of the cases where it does,
you can always fall back on an alternative method for those few sections of
code.

-steve anichini

Next message: darylew_at_[hidden]: "About dlw_gcd.zip 11"
Previous message: Dietmar Kuehl: "Re: [boost] The one thing I like better about <boost/array_traits.hpp> over <boost/array.hpp>"
In reply to: Greg Colvin: "Re: [boost] boost::bind is excellent"
Next in thread: Peter Dimov: "Re: [boost] boost::bind is excellent"

Date view	Thread view	Subject view	Author view