$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Steve Anichini (sanichin_at_[hidden])
Date: 2001-11-06 22:21:29
> -----Original Message-----
> From: Greg Colvin [mailto:gcolvin_at_[hidden]]
> [...]
>
> > ... I've got a question: what about performance of using with standard
> > algorithms, how much is differ from hand made code?
> >
> > Vladimir
>
> Good question.  The only way to answer it for sure is to write
> your code with and without bind and compare the results.
>
Funny you should mention that, as a friend asked me the same thing so I
decided to take a look at the assembly output of boost::bind vs calling a
function directly through a function pointer on two compilers.
Note that a) this doesn't really give you any indication of real
performance, as its a contrived example and b) because the results just
between two compilers vary widely, it's going to depend a lot on how good
your compiler is. Because of a) and b) the best answer is Greg's, but I felt
this might be somewhat illuminating.
I created a contrived example in order to compare the compiler output. It
follows:
// begin code
#include "boost/bind.hpp"
#include <stdio.h>
class Foo
{
public:
    Foo() {}
    int call2(int a1, char a2) { printf("Call 2: (%d, %c)\n", a1, a2);
return 0; }
};
void call1()
{
        Foo foo;
        int x = 5;
        boost::bind(Foo::call2, &foo, _1, 'c')(x);
}
void call2()
{
        Foo foo;
        int (Foo::*pFoo)(int, char) = Foo::call2;
        int x = 5;
        ((&foo)->*pFoo)(x, 'c');
}
int main(int argc, char* argv[])
{
        call1();
        call2();
        return 0;
}
// end code
The generated assembly on MSVC 6.0 (cl.exe 12.00.8804) with "Maximize speed"
optimization settings + Pentium Pro as processor + debug info+inline any
appropriate+C++ exceptions+RTTI turned on:
34:   int main(int argc, char* argv[])
35:   {
00401060   sub         esp,0Ch
36:       call1();
00401063   lea         ecx,[esp]
00401067   lea         eax,[esp+8]
0040106B   push        ecx
0040106C   lea         ecx,[esp+8]
00401070   mov         dword ptr [esp+4],eax
00401074   call        boost::_bi::value<Foo *>::value<Foo *> (004010a0)
00401079   mov         ecx,dword ptr [esp+4]
0040107D   push        63h
0040107F   push        5
00401081   call        Foo::call2 (00401030)
37:       call2();
00401086   push        63h
00401088   push        5
0040108A   lea         ecx,[esp+10h]
0040108E   call        Foo::call2 (00401030)
38:       return 0;
00401093   xor         eax,eax
39:   }
00401095   add         esp,0Ch
00401098   ret
41:       value(T const & t): t_(t) {}
004010A0   mov         eax,ecx
004010A2   mov         ecx,dword ptr [esp+4]
004010A6   mov         edx,dword ptr [ecx]
004010A8   mov         dword ptr [eax],edx
004010AA   ret         4
So VC was able to strip out/inline a lot of the temporaries and constructor
calls. The only one remaining is the constructor to value<Foo *>::value<Foo
*>.
The assembly output of Metrowerks CodeWarrior 7 for Windows, Pentium Pro as
processor, smart inlining, auto-inline, global optimizations level 4, pool
strings, C++ Exceptions, RTTI turned on:
Function: _main
; 35: {
; 36: 	call1();
00000: 0000 55                          PUSH EBP
00008: 0001 8A 15 00000000              MOV DL, BYTE PTR .bss+00000000
00000: 0007 89 E5                       MOV EBP, ESP
00000: 0009 56                          PUSH ESI
00000: 000A 57                          PUSH EDI
00000: 000B 83 EC 40                    SUB ESP, 00000040
00008: 000E 8D 45 FFFFFFD7              LEA EAX, DWORD PTR FFFFFFD7[EBP]
00008: 0011 6A 63                       PUSH 00000063
00008: 0013 52                          PUSH EDX
00008: 0014 50                          PUSH EAX
00008: 0015 8D 45 FFFFFFB8              LEA EAX, DWORD PTR FFFFFFB8[EBP]
00008: 0018 FF 35 00000008              PUSH DWORD PTR __at_192+00000008
00008: 001E FF 35 00000004              PUSH DWORD PTR __at_192+00000004
00008: 0024 FF 35 00000000              PUSH DWORD PTR __at_192
00008: 002A 50                          PUSH EAX
00008: 002B E8 00000000                 CALL SHORT
?bind@?$$HVFoo@@HDPAV1_at_V?$arg@$00@_bi_at_boost@@D_at_4@YA?AV?$bind_t_at_HV?$mf2_at_HVFoo
@@HD@_mfi_at_boost@@V?$list3_at_V?$value_at_PAVFoo@@@_bi_at_boost@@V?$arg@$00_at_23@V?$valu
e_at_D@23@@_bi_at_3@@34_at_P81@AEHHD_at_ZPAV1@V234_at_D@Z
00008: 0030 8D 55 FFFFFFB8              LEA EDX, DWORD PTR FFFFFFB8[EBP]
00008: 0033 8D 7D FFFFFFE8              LEA EDI, DWORD PTR FFFFFFE8[EBP]
00008: 0036 8D 32                       LEA ESI, DWORD PTR 00000000[EDX]
00008: 0038 83 C4 1C                    ADD ESP, 0000001C
00008: 003B A5                          MOVSD
00008: 003C A5                          MOVSD
00008: 003D A5                          MOVSD
00008: 003E A5                          MOVSD
00008: 003F 8A 42 15                    MOV AL, BYTE PTR 00000015[EDX]
00008: 0042 8B 4A 10                    MOV ECX, DWORD PTR 00000010[EDX]
00008: 0045 50                          PUSH EAX
00008: 0046 8D 45 FFFFFFE8              LEA EAX, DWORD PTR FFFFFFE8[EBP]
00008: 0049 6A 05                       PUSH 00000005
00008: 004B 50                          PUSH EAX
00008: 004C E8 00000000                 CALL SHORT ___ptmf_scall, 0000000C
; 37: 	call2();
00008: 0051 BE 00000000                 MOV ESI, OFFSET __at_230
00008: 0056 8D 7D FFFFFFD8              LEA EDI, DWORD PTR FFFFFFD8[EBP]
00008: 0059 A5                          MOVSD
00008: 005A 6A 63                       PUSH 00000063
00008: 005C 6A 05                       PUSH 00000005
00008: 005E A5                          MOVSD
00008: 005F A5                          MOVSD
00008: 0060 8D 45 FFFFFFD8              LEA EAX, DWORD PTR FFFFFFD8[EBP]
00008: 0063 8D 4D FFFFFFE7              LEA ECX, DWORD PTR FFFFFFE7[EBP]
00008: 0066 50                          PUSH EAX
00008: 0067 E8 00000000                 CALL SHORT ___ptmf_scall, 0000000C
; 38: 	return 0;
00008: 006C 31 C0                       XOR EAX, EAX
00000: 006E                     L0000:
00000: 006E 8D 65 FFFFFFF8              LEA ESP, DWORD PTR FFFFFFF8[EBP]
00000: 0071 5F                          POP EDI
00000: 0072 5E                          POP ESI
00000: 0073 5D                          POP EBP
00000: 0074 C3                          RETN
Function:
?bind@?$$HVFoo@@HDPAV1_at_V?$arg@$00@_bi_at_boost@@D_at_4@YA?AV?$bind_t_at_HV?$mf2_at_HVFoo
@@HD@_mfi_at_boost@@V?$list3_at_V?$value_at_PAVFoo@@@_bi_at_boost@@V?$arg@$00_at_23@V?$valu
e_at_D@23@@_bi_at_3@@34_at_P81@AEHHD_at_ZPAV1@V234_at_D@Z
; 1079: {
; 1082:     return _bi::bind_t<R, F, list_type>(F(f), list_type(a1, a2,
a3));
00000: 0000 55                          PUSH EBP
00000: 0001 89 E5                       MOV EBP, ESP
00000: 0003 56                          PUSH ESI
00000: 0004 57                          PUSH EDI
00000: 0005 83 EC 3C                    SUB ESP, 0000003C
00000: 0008 83 E4 FFFFFFF8              AND ESP, FFFFFFF8
00008: 000B 8A 55 20                    MOV DL, BYTE PTR 00000020[EBP]
00008: 000E 8B 75 18                    MOV ESI, DWORD PTR 00000018[EBP]
00000: 0011 8B 45 08                    MOV EAX, DWORD PTR 00000008[EBP]
00008: 0014 89 F7                       MOV EDI, ESI
00008: 0016 88 54 24 1F                 MOV BYTE PTR 0000001F[ESP], DL
00008: 001A 8A 4C 24 1F                 MOV CL, BYTE PTR 0000001F[ESP]
00008: 001E 89 74 24 18                 MOV DWORD PTR 00000018[ESP], ESI
00008: 0022 8D 75 0C                    LEA ESI, DWORD PTR 0000000C[EBP]
00008: 0025 88 4C 24 37                 MOV BYTE PTR 00000037[ESP], CL
00008: 0029 8A 4C 24 37                 MOV CL, BYTE PTR 00000037[ESP]
00008: 002D 89 7C 24 30                 MOV DWORD PTR 00000030[ESP], EDI
00008: 0031 8D 3C 24                    LEA EDI, DWORD PTR 00000000[ESP]
00008: 0034 8A 55 1C                    MOV DL, BYTE PTR 0000001C[EBP]
00008: 0037 88 4C 24 15                 MOV BYTE PTR 00000015[ESP], CL
00008: 003B 88 54 24 14                 MOV BYTE PTR 00000014[ESP], DL
00008: 003F A5                          MOVSD
00008: 0040 A5                          MOVSD
00008: 0041 A5                          MOVSD
00008: 0042 8D 34 24                    LEA ESI, DWORD PTR 00000000[ESP]
00008: 0045 8D 7C 24 20                 LEA EDI, DWORD PTR 00000020[ESP]
00008: 0049 A5                          MOVSD
00008: 004A A5                          MOVSD
00008: 004B A5                          MOVSD
00008: 004C A5                          MOVSD
00008: 004D 8D 74 24 20                 LEA ESI, DWORD PTR 00000020[ESP]
00008: 0051 8D 38                       LEA EDI, DWORD PTR 00000000[EAX]
00008: 0053 A5                          MOVSD
00008: 0054 A5                          MOVSD
00008: 0055 A5                          MOVSD
00008: 0056 8B 74 24 30                 MOV ESI, DWORD PTR 00000030[ESP]
00008: 005A 89 70 10                    MOV DWORD PTR 00000010[EAX], ESI
00008: 005D 8A 54 24 14                 MOV DL, BYTE PTR 00000014[ESP]
00008: 0061 88 50 14                    MOV BYTE PTR 00000014[EAX], DL
00008: 0064 8A 4C 24 15                 MOV CL, BYTE PTR 00000015[ESP]
00008: 0068 88 48 15                    MOV BYTE PTR 00000015[EAX], CL
00000: 006B                     L0000:
00000: 006B 8D 65 FFFFFFF8              LEA ESP, DWORD PTR FFFFFFF8[EBP]
00000: 006E 5F                          POP EDI
00000: 006F 5E                          POP ESI
00000: 0070 5D                          POP EBP
00000: 0071 C3                          RETN
As you can see, CodeWarrior generates a lot more code for the bind version!
For example, the bind_t<> constructor is not inlined and looks like it does
a lot more.
What does this mean? Not a whole lot, unfortunately. "There are lies, damn
lies, and benchmarks". This is just an example of assembly generated for a
specific contrived case on two specific compilers. I don't know if we can
extrapolate a whole lot from this specific case. Still, it's interesting to
look at. I'd be interested in seeing what other compilers generate on x86
for the same code.
One thing I note, that while I have not looked at generated assembly, my gut
feeling is doing something like this
struct FooCall2 : public std::unary_function<int, char>
{
public:
        FooCall2(Foo *pFoo, int bindX) : mpFoo(pFoo), mBindY(bindX) {}
        int operator()(char c) const
        {
                return mpFoo->call2(mBindX, c); }
        }
protected:
        Foo *mpFoo;
        int mBindX;
};
// ...
FooCall2 (&foo, 5)('x');
will usually generate better code than a boost::bind() call that
accomplishes the same thing:
boost::bind(Foo::call2, &foo, 5, _1)('x');
My rationale is the former doesn't use any function pointers. Function
pointers can prevent compilers from inlining calls.
But as we all know, efficiency isn't everything. boost::bind offers much
more flexibility that the former approach - for example, what if we needed
to change the bound parameter to the character and leave the integer
argument unbound? That would require a different version of FooCall2,
whereas with bind it's changing a couple arguments. That kind of elegance
and coding efficiency is more than worth any potential run-time penalty. And
given the 80/20 rule, most of the time the run-time penalty will not affect
overall performance of the program. For the 20% of the cases where it does,
you can always fall back on an alternative method for those few sections of
code.
-steve anichini