$include_dir="/home/hyper-archives/boost-commit/include"; include("$include_dir/msg-header.inc") ?>
From: eric_at_[hidden]
Date: 2008-06-12 19:50:13
Author: eric_niebler
Date: 2008-06-12 19:50:13 EDT (Thu, 12 Jun 2008)
New Revision: 46363
URL: http://svn.boost.org/trac/boost/changeset/46363
Log:
add implementation notes appendix, document tracking_ptr
Added:
   trunk/libs/xpressive/doc/tracking_ptr.qbk   (contents, props changed)
Text files modified: 
   trunk/libs/xpressive/doc/xpressive.qbk |     6 ++++++                                  
   1 files changed, 6 insertions(+), 0 deletions(-)
Added: trunk/libs/xpressive/doc/tracking_ptr.qbk
==============================================================================
--- (empty file)
+++ trunk/libs/xpressive/doc/tracking_ptr.qbk	2008-06-12 19:50:13 EDT (Thu, 12 Jun 2008)
@@ -0,0 +1,172 @@
+[/
+ / Copyright (c) 2008 Eric Niebler
+ /
+ / Distributed under the Boost Software License, Version 1.0. (See accompanying
+ / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
+ /]
+
+[section Cycle collection with [^tracking_ptr<>]]
+
+In xpressive, regex objects can refer to each other and themselves by value or by reference.
+In addition, they ref-count their referenced regexes to keep them alive. This creates the possibility
+for cyclic reference counts, and raises the possibility of memory leaks. xpressive avoids leaks
+by using a type called `tracking_ptr<>`. This doc describes at a high level how `tracking_ptr<>`
+works.
+
+[h2 Constraints]
+
+Our solution must meet the following design constraints:
+
+* No dangling references: All objects referred to directly or indirectly must be kept alive
+  as long as the references are needed.
+* No leaks: all objects must be freed eventually.
+* No user intervention: The solution must not require users to explicitly invoke some cycle
+  collection routine.
+* Clean-up is no-throw: The collection phase will likely be called from a destructor, so it
+  must never throw an exception under any circumstance.
+  
+[h2 Handle-Body Idiom]
+
+To use `tracking_ptr<>`, you must separate your type into a handle and a body. In the case of
+xpressive, the handle type is called basic_regex<> and the body is called regex_impl<>. The
+handle will store a `tracking_ptr<>` to the body.
+
+The body type must inherit from enable_reference_tracking<>. This gives the body the bookkeeping
+data structures that `tracking_ptr<>` will use. In particular, it gives the body:
+
+# `std::set<shared_ptr<body> > refs_` : collection of bodies to which this body refers, and
+# `std::set<weak_ptr<body> > deps_` : collection of bodies which refer to this body.
+
+[h2 References and Dependencies]
+
+We refer to (1) above as the "references" and (2) as the "dependencies". It is crucial to the
+understanding of `tracking_ptr<>` to recognize that the set of references includes both those
+objects that are referred to directly as well as those that are referred to indirectly (that
+is, through another reference). The same is true for the set of dependencies. In other words,
+each body holds a ref-count directly to every other body that it needs. 
+
+Why is this important?  Because it means that when a body no longer has a handle referring
+to it, all its references can be released immediately without fear of creating dangling references. 
+
+References and dependencies cross-pollinate. Here's how it works:
+
+# When one object acquires another as a reference, the second object acquires the first as
+  a dependency.
+# In addition, the first object acquires all of the second object's references, and the second
+  object acquires all of the first object's dependencies.
+# When an object picks up a new reference, the reference is also added to all dependent objects.
+# When an object picks up a new dependency, the dependency is also added to all referenced
+  objects.
+# An object is never allowed to have itself as a dependency. Objects may have themselves as
+  references, and often do.
+
+Consider the following code:
+
+    sregex expr;
+    {
+        sregex group  = '(' >> by_ref(expr) >> ')';                 // (1)
+        sregex fact   = +_d | group;                                // (2)
+        sregex term   = fact >> *(('*' >> fact) | ('/' >> fact));   // (3)
+        expr          = term >> *(('+' >> term) | ('-' >> term));   // (4)
+    }                                                               // (5)
+
+Here is how the references and dependencies propagate, line by line:
+
+[table
+[[Expression][Effects]]
+[[1) `sregex group  = '(' >> by_ref(expr) >> ')';`]
+[[^group: cnt=1 refs={expr} deps={}\n
+expr:  cnt=2 refs={} deps={group}]]]
+
+[[2) `sregex fact   = +_d | group;`]
+[[^group: cnt=2 refs={expr} deps={fact}\n
+expr:  cnt=3 refs={} deps={group,fact}\n
+fact:  cnt=1 refs={expr,group} deps={}]]]
+
+[[3) `sregex term   = fact >> *(('*' >> fact) | ('/' >> fact));`]
+[[^group: cnt=3 refs={expr} deps={fact,term}\n
+expr:  cnt=4 refs={} deps={group,fact,term}\n
+fact:  cnt=2 refs={expr,group} deps={term}\n
+term:  cnt=1 refs={expr,group,fact} deps={}]]]
+
+[[4) `expr          = term >> *(('+' >> term) | ('-' >> term));`]
+[[^group: cnt=5 refs={expr,group,fact,term} deps={expr,fact,term}\n
+expr:  cnt=5 refs={expr,group,fact,term} deps={group,fact,term}\n
+fact:  cnt=5 refs={expr,group,fact,term} deps={expr,group,term}\n
+term:  cnt=5 refs={expr,group,fact,term} deps={expr,group,fact}]]]
+
+[[5) `}`]
+[[^expr:  cnt=2 refs={expr,group,fact,term} deps={group,fact,term}]]]
+]
+
+This shows how references and dependencies propagate when creating cycles of objects. After
+line (4), which closes the cycle, every object has a ref-count on every other object, even
+to itself. So how does this not leak? Read on.
+
+[h2 Cycle Breaking]
+
+Now that the bodies have their sets of references and dependencies, the hard part is done.
+All that remains is to decide when and where to break the cycle. That is the job of `tracking_ptr<>`,
+which is part of the handle. The `tracking_ptr<>` holds 2 `shared_ptr`s. The first, obviously,
+is the `shared_ptr<body>` -- the reference to the body to which this handle refers. The other
+`shared_ptr` is used to break the cycle. It ensures that when all the handles to a body go out
+of scope, the body's set of references is cleared.
+
+This suggests that more than one handle can refer to a body. In fact, `tracking_ptr<>` gives
+you copy-on-write semantics -- when you copy a handle, the body is shared. That makes copies
+very efficient. Eventually, all the handles to a particular body go out of scope. When that
+happens, the ref count to the body might still be greater than 0 because some other body (or
+this body itself!) might be holding a reference to it. However, we are certain that the cycle-breaker's
+ref-count goes to 0 because the cycle-breaker only lives in handles. No more handles, no more
+cycle-breakers.
+
+What does the cycle-breaker do? Recall that the body has a set of references of type 
+`std::set<shared_ptr<body> >`. Let's call this type "references_type". The cycle-breaker is a
+`shared_ptr<references_type>`. It uses a custom deleter, which is defined as follows:
+
+  template<typename DerivedT>
+  struct reference_deleter
+  {
+      void operator ()(std::set<shared_ptr<DerivedT> > *refs) const
+      {
+          refs->clear();
+      }
+  };
+
+The job of to the cycle breaker is to ensure that when the last handle to a body goes away,
+the body's set of references is cleared. That's it.
+
+We can clearly see how this guarantees that all bodies are cleaned up eventually. Once every
+handle has gone out of scope, all the bodies' sets of references will be cleared, leaving none
+with a non-zero ref-count. No leaks, guaranteed.
+
+It's a bit harder to see how this guarantees no dangling references. Imagine that there are 3 
+bodies: A, B and C. A refers to B which refers to C. Now all the handles to B go out of scope,
+so B's set of references is cleared. Doesn't this mean that C gets deleted, even though it
+is being used (indirectly) by A? It doesn't. This situation can never occur because we propagated
+the references and dependencies above such that A will be holding a reference directly to C
+in addition to B. When B's set of references is cleared, no bodies get deleted, because they
+are all still in use by A.
+
+[h2 Future Work]
+
+All these `std::set`s and `shared_ptr`s and `weak_ptr`s! Very inefficient. I used them because
+they were handy. I could probably do better.
+
+Also, some objects stick around longer than they need to. Consider:
+
+    sregex b;
+    {
+        sregex a = _;
+        b = by_ref(a);
+        b = _;
+    }
+    // a is still alive here!
+
+Due to the way references and dependencies are propagated, the `std::set` of references can only
+grow. It never shrinks, even when some references are no longer needed. For xpressive this
+isn't an issue. The graphs of referential objects generally stay small and isolated. If someone
+were to try to use `tracking_ptr<>` as a general ref-count-cycle-collection mechanism, this problem
+would have to be addressed.
+
+[endsect]
Modified: trunk/libs/xpressive/doc/xpressive.qbk
==============================================================================
--- trunk/libs/xpressive/doc/xpressive.qbk	(original)
+++ trunk/libs/xpressive/doc/xpressive.qbk	2008-06-12 19:50:13 EDT (Thu, 12 Jun 2008)
@@ -122,5 +122,11 @@
 
 [include perf.qbk]
 
+[section Appendix 5: Implementation Notes]
+
+[include tracking_ptr.qbk]
+
+[endsect]
+
 [endsect]