From: Daniel Frey (d.frey_at_[hidden])
Date: 2002-10-13 04:06:06


On Sat, 12 Oct 2002 22:28:01 +0200, Terje Slettebø wrote:

> I've done some preliminary testing (only tested on one compiler, Intel
> C++ 7.0 beta), to test this hypothesis, to test the various ways of
> implementing operator+(). I made the following test program:

(Note that the Intel C++ 6 had a bug in it's NRVO which prevented a NRV of
type T in a function which returns const T prevented the NRVO from being
applied. I already told Intel and it's in their pipeline, but I have not
yet received a notification that the issue is solved. Maybe it's already
corrected in the v7)

> --- Start ---
>
> class Test
> {
> public:
> Test(int n) : num(n) {}
>
> Test &operator+=(const Test &other)
> {
> num+=other.num;
>
> return *this;
> }
>
> int num;
> int array[1024]; // Just so that copying shows up
> };

You might want to place some cout's in the ctor/dtor to see when objects
are created and destroyed. This is also important as it prevent
assembler-level optimizations and shows what *object* are really optimized
away.

> Test test_num1(1);
> Test test_num2(2);
> int num;
> int array;
>
> int main()
> {
> Test test_num=test_num1+test_num2;

It is also important to test expressions like x = a+b+c and x = a+(b+c) to
see the real difference of taking parameters by value :) Especially the
difference for taking the first or the second parameter by value...

> num=test_num.num;
> array=test_num.array[0];
> }
>
> First operator+():
>
> Test operator+(const Test &t1,const Test &t2) {
> return Test(t1)+=t2;
> }
>
> [snip]
>
> Note the two "rep movs". This shows copying of the "array" member.
>
> Next version:
>
> inline Test operator+(const Test &t1,const Test &t2) {
> Test nrv(t1);
> nrv+=t2;
> return nrv;
> }
>
> [snip]
>
> Preliminary tests seem to confirm what Howard and Daniel said, that
> using a named temporary, rather than the constructor call with "+=", may
> make more optimised code. There's only one "rep movs" (for copying the
> array) in the code above, compared to two in the first one. The one copy
> is needed for the receiving variable, "test_num", so the above is in
> fact optimal code, with no unnecessary temporaries being created.
>
> Let's try the third alternative:
>
> inline Test operator+(Test t1,const Test &t2) {
> t1+=t2;
>
> return t1;
> }
>
> [snip]
>
> Hm. Back to having two copies, again (two "rep movs").
>
> Note, this is only tested on _one_ compiler, but it may give us
> something to go on. From these results, Daniel's suggestion (second
> version here) turned out to be the most optimised one.
>
> It seems that, at least for this compiler, Andrei's suggestion to pass
> by value if you need to make a copy, anyway, resulted in less optimised
> code. Considering that, in that case, it has to make a copy, to call the
> function, then it's already too late to use the NRVO in the function, as
> it's already a copy, so the above results makes sense.

The point IMHO is, that taking the parameter by value may lead to equally
optimized code for *some* cases. For the general case, only the NRVO may
lead to optimized code for all cases. And a function which takes a const
T& and makes a copy of it is IMHO not lying. If it makes a copy, it's an
implementation detail. I have seen implementation of operator+ which don't
make a copy of the arguments, but why should all these details be
reflected in the function's signature?

> To quote again from above:
>
>> Taking const& T
>> as arguments in /any/ function when you actually *do* need a copy
>> chokes
> the
>> compiler (and Zuto) and practically forbids them to make important
>> optimizations.
>
> At least for Intel C++, this turns out to be the other way around.
> Calling by value prevents the NRVO.

Yes. And it's not limited to the Intel C++, as the standard itself
requires compilers to behave this way. A compiler is basically allowed to
remove temporaries only if it can figure out that this does not have any
observable side effects. And I have never seen any compiler which is smart
enough to do this for objects like the above 'Test'-objects. Or if their
are special rules which allows to remove temporaries even if there are
observable side effects. This is the reason why I think it is important to
apply the NRVO as it can do an optimization that the compiler cannot
figure out itself.

Regards, Daniel