$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Michael Marcin (mmarcin_at_[hidden])
Date: 2007-06-29 16:24:47
Zeljko Vrba wrote:
> On Fri, Jun 29, 2007 at 02:05:59AM -0500, Michael Marcin wrote:
>> There is a senior engineer that I work with who believes templates are 
>> slow and prefers to write C or assembly routines. The templates are slow 
>>
> What does he base his belief on?  And did he provide *any* proof for his
> reasoning?  (Well, if he's in a higher position than you, he might not
> be required to do so.  People listen to him because he's in higher
> position, not because he has good arguments.  Been there, experienced that.)
> 
Apparently from looking at generated assembly from template code in the 
past with old compilers and probably bad programmers.
>> Write some interesting code and generate the assembly for it.
>> Analyze this assembly manually and save it off in source control.
>> When the test suite is run compile that code down to assembly again and 
>> have the test suite do a simple byte comparison of the two files.
>>
> I don't understand this part.  What do you want to compare?  Macro vs.
> template version?  This will certainly *not* yield identical object file
> (because it contains symbol names, etc. along with generated code).
> 
Yes this is a little confusing. Essentially the idea was to write 
snippets in both C and with templates and manually compare the generated 
assembly by looking at it. Then after I'm satisfied with the results the 
regenerate and compare tests would hopefully only fail if a meaningful 
change was made to the library code at which point I would have to 
reexamine the files by hand again.  A lot of work.. especially when 
multiple configurations come into play.
>> Write the templated and C versions of the algorithms.
>> Run the test suite to generate the assembly for each version.
>> Write a parser to and a heuristic to analyze the generated code of each.
>> Grade and compare each
>>
> Unfortunately, it's almost meaningless to analyze the run-time performance of a
> program (beyond algorithmic complexity) without the actual input.  "Register
> usage" is a vague term, and the number of function calls does not have to
> play a role (infrequent code paths, large functions, cache effects, etc).
> 
Whether it is matters or not is another question but you can look at 
generated code and determine if the compiler is doing a good job.
For instance say I have:
class my_type
{
public:
     int value() const { return m_value; }
private:
     int m_value;
};
bool operator==( const my_type& lhs, const my_type& rhs )
{
     return lhs.value() == rhs.value();
}
bool test_1( my_type a, my_type b )
{
     return a == b;
}
bool test_2( int a, int b )
{
     return a == b;
}
Now if test_1 ends up calling a function for operator== or does any 
pushes onto the stack its not optimal and my_type and/or its operator== 
need to be fiddled with.
It's this level of straight forward code I'm concerned with at the moment.
>> Does anyone have any input/ideas/suggestions?
>>
> How about traditional profiling?  Write a test-suite that feeds the same
> input to C and C++ versions and compares their run-time?  Compile once
> with optimizations, other time with profiling and compare run-times and
> hot-spots shown by the profiler.
As I said before there is no reliable timing mechanism available and the 
process of compiling, installing, and running programs on this target 
cannot be automated AFAIK.
Thanks,
Michael Marcin