$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] BOOST_PP array extension proposal
From: Paul Mensonides (pmenso57_at_[hidden])
Date: 2015-09-11 07:04:46
On 9/10/2015 6:59 PM, Matt Calabrese wrote:
> Yeah, I have to do that now with GCC, too, but FWIW the problems I was
> having were even before GCC added macro expansion tracking. I'm not sure
> what GCC does that's so memory intensive during preprocessing.
To some degree it depends on whether the preprocessor is doing what it 
is supposed to do at all or just something that kinda-sorta results in 
the same thing.  One of the biggest performance hits that I have seen in 
those preprocessors that do it right (or at least mostly) has to do with 
what happens with actual arguments to macros.
Consider what happens here:
#define A 1 B
#define B 2 C
#define C 3 D
#define D 4
0 A 5
This *looks* like four nested "calls," but it isn't.  Rather, this is a 
stream.  Ignoring the whole disabling context/blue paint for the moment, 
the replacement process proceeds as follows:
0 A 5
   ^
0 1 B 5
   ^
0 1 B 5
     ^
0 1 2 C 5
     ^
0 1 2 C 5
       ^
0 1 2 3 D 5
       ^
0 1 2 3 D 5
         ^
0 1 2 3 4 5
         ^
0 1 2 3 4 5
           ^
0 1 2 3 4 5
             ^
In this scenario, once the scan position has advanced past a 
preprocessing token, that token is "done".  I.e. it is pure output (to 
standard output, to a file, to the parser, or whatever).
Now change the scenario slightly:
#define M(x) x
M(0 A 5)
In this case, the argument is scanned for macro replacement as an 
independent scan.  That whole scan takes place as above, the result of 
which is substituted for 'x' in the replacement list, the invocation of 
'M' is replaced by that result, the scan position is placed at the first 
token of that result, and top-level scanning resumes.
The overall difference here is that the presence of the argument (where 
the argument is used in the replacement list as a non # or ## operand) 
causes what amounts to a recursive call to the entire scanning for macro 
replacement process.  The results of which must be placed in a buffer, 
not output, in order to perform the substitution.
So, A defined as B defined as C (and so on) is not recursive, but 
M(M(M())) usually *is* recursive.
As an immediate testable example, Chaos has a macro that is used to 
count scans for macro replacement:
#include <chaos/preprocessor/recursion/delve.h>
#define A(x) B(x)
#define B(x) C(x)
#define C(x) D(x)
#define D(x) E(x)
#define E(x) x
CHAOS_PP_HALT(
     A(
         CHAOS_PP_DELVE()
     )
)
The result here is 6 (on entry to A, on entry to B, on entry to C, on 
entry to D, on entry to E, and finally when top level scanning is resumed).
Change the above to:
#include <chaos/preprocessor/recursion/delve.h>
#define A(x)    B(, x)
#define B(p, x) C(, p##x)
#define C(p, x) D(, p##x)
#define D(p, x) E(, p##x)
#define E(p, x) p##x
CHAOS_PP_HALT(
     A(
         CHAOS_PP_DELVE()
     )
)
Now you get 2 (on entry to A and when top level scanning is resumed). 
This occurs because the placemarker concatenation is stopping the 
argument from being scanned for macro replacement all the way through 
most of the structure.  (When this type of thing is scaled up, it can be 
*massively* faster than not doing it.)
None of these scenarios here are actually too bad because the recursion 
depth is shallow (and by "recursion depth" I mean, of course, the 
recursive scan depth) so you aren't getting buffers upon buffers, etc.. 
  However, a library built with performance in mind from the ground up 
(i.e. not Boost.Preprocessor and not Chaos in many cases), has to be 
built with the above type of stuff in mind.
With regard to tuples versus arrays...  The answer is really "neither". 
  Both are bad choices for a data structure.  Usually they are used for 
passing multiple auxiliary arguments through some higher-order 
algorithm.  The right solution there is to simply never pack them 
together in that fashion at all:
#define M(s, n, a, b, c) (a + b + c)
CHAOS_PP_EXPR(CHAOS_PP_REPEAT(
     5, M, 1, 2, 3
))
(i.e. no TUPLE_ELEM in sight)
You can only get random access to tuple elements for a small number of 
elements.  So for any largish container (i.e. where performance matters 
more), algorithmic processing is usually more efficient with a sequence. 
  In a library like Chaos, it also makes much more efficient use of the 
available recursion steps because things like folds can be unrolled 
(because the data structure itself provides the computational horsepower 
to process itself--even if the algorithm is n^2 or worse).
But, given a choice between array and tuple as a container (rather than 
just a parameter packing mechanism), I would choose tuples.  Otherwise, 
any algorithmic processing of the data structure is going to require 
arithmetic to maintain the size of the array.  (If you statically know 
the size of the array, then you aren't using it as a container, but 
rather just a packing structure.)
Regards,
Paul Mensonides