$include_dir="/home/hyper-archives/boost-commit/include"; include("$include_dir/msg-header.inc") ?>
From: eric_at_[hidden]
Date: 2007-10-09 18:25:19
Author: eric_niebler
Date: 2007-10-09 18:25:18 EDT (Tue, 09 Oct 2007)
New Revision: 39867
URL: http://svn.boost.org/trac/boost/changeset/39867
Log:
more user docs for semantic actions
Text files modified: 
   trunk/libs/xpressive/doc/actions.qbk |   200 +++++++++++++++++++++++++++++++++++++++ 
   1 files changed, 198 insertions(+), 2 deletions(-)
Modified: trunk/libs/xpressive/doc/actions.qbk
==============================================================================
--- trunk/libs/xpressive/doc/actions.qbk	(original)
+++ trunk/libs/xpressive/doc/actions.qbk	2007-10-09 18:25:18 EDT (Tue, 09 Oct 2007)
@@ -37,7 +37,7 @@
         sregex pair = ( (s1= +_w) >> "=>" >> (s2= +_d) )
             [ ref(result)[s1] = as<int>(s2) ];
         
-        // Match one or more word/iteger pairs, separated
+        // Match one or more word/integer pairs, separated
         // by whitespace.
         sregex rx = pair >> *(+_s >> pair);
 
@@ -85,9 +85,205 @@
 say `ref(result)[s1]`, even though `result` doesn't have an `operator[]` that
 would accept `s1`.]
 
+In addition to the sub-match placeholders `s1`, `s2`, etc., you can also use
+the placeholder `_` within an action to refer back to the string matched by
+the sub-expression to which the action is attached. For instance, you can use
+the following regex to match a bunch of digits, interpret them as an integer
+and assign the result to a local variable:
+
+    int i = 0;
+    // Here, _ refers back to all the
+    // characters matched by (+_d)
+    sregex rex = (+_d)[ ref(i) = as<int>(_) ];
+
 [h3 Lazy Action Execution]
 
-TODO
+What does it mean, exactly, to attach an action to part of a regular expression
+and perform a match? When does the action execute? If the action is part of a
+repeated sub-expression, does the action execute once or many times? And if the
+sub-expression initially matches, but ultimately fails because the rest of the
+regular expression fails to match, is the action executed at all?
+
+The answers are that actions are executed /lazily/. When a sub-expression
+matches a string, its action is placed on a queue, along with the current
+values of any sub-matches to which the action refers. If the match algorithm
+must backtrack, actions are popped off the queue as necessary. Only after the
+entire regex has matched successfully are the actions actually exeucted. They
+are executed all at once, in the order in which they were added to the queue,
+as the last step before _regex_match_ returns.
+
+For example, consider the following regex that increments a counter whenever
+it finds a digit.
+
+    int i = 0;
+    std::string str("1!2!3?");
+    // count the exciting digits, but not the
+    // questionable ones.
+    sregex rex = +( _d [ ++ref(i) ] >> '!' );
+    regex_search(str, rex);
+    assert( i  == 2 );
+
+The action `++ref(i)` is queued three times: once for each found digit. But
+it is only /executed/ twice: once for each digit that precedes a `'!'`
+character. When the `'?'` character is encountered, the match algorithm
+backtracks, removing the final action from the queue.
+
+[h3 Referring to Local Variables]
+
+As we've seen in the examples above, we can refer to local variables within
+an actions using `xpressive::ref()`. Any such variables are held by reference
+by the regular expression, and care should be taken to avoid letting those
+references dangle. For instance, in the following code, the reference to `i`
+is left to dangle when `bad_voodoo()` returns:
+
+    sregex bad_voodoo()
+    {
+        int i = 0;
+        sregex rex = +( _d [ ++ref(i) ] >> '!' );
+        // ERROR! rex refers by reference to a local
+        // variable, which will dangle after bad_voodoo()
+        // returns.
+        return rex;
+    }
+
+When writing semantic actions, it is your responsibility to make sure that
+all the references do not dangle. One way to do that would be to make the
+variables shared pointers that are held by the regex by value.
+
+    sregex good_voodoo(boost::shared_ptr<int> pi)
+    {
+        // Use val() to hold the shared_ptr by value:
+        sregex rex = +( _d [ ++*val(pi) ] >> '!' );
+        // OK, rex holds a reference count to the integer.
+        return rex;
+    }
+
+In the above code, we use `xpressive::val()` to hold the shared pointer by
+value. That's not normally necessary because local variables appearing in 
+actions are held by value by default, but in this case, it is necessary. Had
+we written the action as `++*pi`, it would have executed immediately. That's
+because `++*pi` is not an expression template, but `++*val(pi)` is.
+
+It can be tedious to wrap all your variables in `ref()` and `val()` in your
+semantic actions. Xpressive provides the `reference<>` and `value<>` templates
+to make things easier. The following table shows the equivalencies:
+
+[table reference<> and value<>
+[[This ...][... is equivalent to this ...]]
+[[``int i = 0;
+
+sregex rex = +( _d [ ++ref(i) ] >> '!' );``][``int i = 0;
+reference<int> ri(i);
+sregex rex = +( _d [ ++ri ] >> '!' );``]]
+[[``boost::shared_ptr<int> pi(new int(0));
+
+sregex rex = +( _d [ ++*val(pi) ] >> '!' );``][``boost::shared_ptr<int> pi(new int(0));
+value<boost::shared_ptr<int> > vpi(pi);
+sregex rex = +( _d [ ++*vpi ] >> '!' );``]]
+]
+
+As you can see, when using `reference<>`, you need to first declare a local
+variable and then declare a `reference<>` to it. These two steps can be combined
+into one using `local<>`. 
+
+[table local<> vs. reference<>
+[[This ...][... is equivalent to this ...]]
+[[``local<int> i(0);
+
+sregex rex = +( _d [ ++i ] >> '!' );``][``int i = 0;
+reference<int> ri(i);
+sregex rex = +( _d [ ++ri ] >> '!' );``]]
+]
+
+We can use `local<>` to rewrite the above example as follows:
+
+    local<int> i(0);
+    std::string str("1!2!3?");
+    // count the exciting digits, but not the
+    // questionable ones.
+    sregex rex = +( _d [ ++i ] >> '!' );
+    regex_search(str, rex);
+    assert( i.get() == 2 );
+
+Notice that we use `local<>::get()` to access the value of the local
+variable. Also, beware that `local<>` can be uses to create a dangling
+reference, just as `reference<>` can.
+
+[h3 Lazy Functions]
+
+So far, we've seen how to write semantic actions consisting of variables and
+operators. But what if you want to be able to call a function from a semantic
+action? Xpressive provides a mechanism to do this.
+
+The first step is to define a function object type. Here, for instance, is a 
+function object type that calls `push()` on its argument:
+
+    struct push_impl
+    {
+        // Result type, needed for tr1::result_of
+        typedef void result_type;
+
+        template<typename Sequence, typename Value>
+        void operator()(Sequence &seq, Value const &val) const
+        {
+            seq.push(val);
+        }
+    };
+
+The next step is to use xpressive's `function<>` template to define a function
+object named `push`:
+
+    // Global "push" function object.
+    function<push_impl>::type const push = {{}};
+
+The initialization looks a bit odd, but this is because `push` is being 
+statically initialized. That means it doesn't need to be constructed
+at runtime. We can use `push` in semantic actions as follows:
+
+    std::stack<int> ints;
+    // Match digits, cast them to an int
+    // and push it on the stack.
+    sregex rex = (+_d)[push(ref(ints), as<int>(_))];
+
+You'll notice that doing it this way causes member function invocations
+to look like ordinary function invocations. You can choose to write your
+semantic action in a different way that makes it look a bit more like
+a member function call:
+
+    sregex rex = (+_d)[ref(ints)->*push(as<int>(_))];
+
+Xpressive recognizes the use of the `->*` and treats this expression
+exactly the same as the one above.
+
+When your function object must return a type that depends on its
+arguments, you can use a `result<>` member template instead of the
+`result_type` typedef. Here, for example, is a `first` function object
+that returns the `first` member of a `std::pair<>`:
+
+    // Function object that returns the 
+    // first element of a pair.
+    struct first_impl
+    {
+        template<typename Sig> struct result {};
+
+        template<typename This, typename Pair>
+        struct result<This(Pair)>
+        {
+            typedef typename remove_reference<Pair>
+                ::type::first_type type;
+        };
+
+        template<typename Pair>
+        typename Pair::first_type
+        operator()(Pair const &p) const
+        {
+            return p.first;
+        }
+    };
+
+    // OK, use as first(s1) to get the begin iterator
+    // of the sub-match referred to by s1.    
+    function<first_impl> const first = {{}};
 
 [endsect]