$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Greg Link (link_at_[hidden])
Date: 2006-05-08 14:48:07
Well, have you considered certainty of memory aliasing? In  
particular, gcc supports the restrict keyword, (e.g. double  
*__restrict__ c ) indicating that the memory spaces pointed to by c  
will never be accessed by anything /but/ c, allowing it to make load- 
store and register usage optimizations it couldn't otherwise. In  
particular, it's 100% certain in the manually indexed case that a[0]  
will never ever refer to b[1]. Then again, it can't be as sure in the  
looped version.
Just a thought, that may or may not pan out.  All it takes to try is  
a quick addition of __restrict__ however, so it's not a tough test.
- Greg Link
        Penn State University
        York College of Pennsylvania
On May 8, 2006, at 2:23 PM, Brian Budge wrote:
> Thanks for the ideas guys.
>
> Compile options are like so:
> g++  -O3 -msse -mfpmath=sse
>
> I tried the metaprogramming technique (which is pretty nifty :) ), and
> got interesting results.
>
> Basically, it made my += operator run twice as SLOW, while making my +
> operator run twice as FAST.
>
> I have a feeling that this is all due to the different optimizations
> that gcc is doing at multiple stages of compilation.  For example, it
> may be doing autovectorization of the simple loop case of +=, which it
> can't figure out with the metaprogramming technique.  I'm still
> stumped as to why I'm roughly an order of magnitude slower with + than
> with +=.
>
> Any more insights?
>
> Thanks again for the ideas so far!
>   Brian
>
>
> On 5/8/06, John Maddock <john_at_[hidden]> wrote:
>>> Any ideas how to increase the performance of the new code here?  A
>>> factor of 10 makes it seem like I am just missing something  
>>> important.
>>
>> I would suspect it's the loop that's at fault, although very I'm  
>> surprised
>> it's a factor of 10.  Your original code had the loop unrolled, so  
>> you might
>> try a bit of template metaprogramming to achieve the same effect  
>> here.
>> Otherwise you're going to have to do a bit of debugging and/or  
>> inspection of
>> the assembly generated.
>>
>> BTW the measurements you made were in release mode right?  If inline
>> expansions are turned off (debug mode for example) the operators- 
>> based
>> version may well pass through many more function calls.  Of course  
>> these all
>> disappear as long as your compiler does a reasonable job of inlining.
>>
>> HTH, John.
>>
>> _______________________________________________
>> Boost-users mailing list
>> Boost-users_at_[hidden]
>> http://listarchives.boost.org/mailman/listinfo.cgi/boost-users
>>
> _______________________________________________
> Boost-users mailing list
> Boost-users_at_[hidden]
> http://listarchives.boost.org/mailman/listinfo.cgi/boost-users