$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Douglas Gregor (gregod_at_[hidden])
Date: 2002-06-11 22:40:51
On Tuesday 11 June 2002 11:25 am, Itay Maman wrote:
> It's nice to know I am not the only one who thinks a
> type-list based union is a good idea :)
Actually, I'm not convinced that type lists are the best choice for 
implementing a discriminated union type. I'll explain this after I've given 
some more interface opinions. 
> > Here's the current requirements list from the Wiki
>
> page:
> >     * Interface should support construction and
>
> assignment from any of the
>
> > types that the variant can hold.
>
> What about automatic casting from non-specified types?
> (For example: initializing a union of <float, string>
> with a double value). Personally, I think this
> behavioral detail should be specified by a policy.
I agree that we need implicit conversions to types stored in the union (e.g., 
your variant<float,string> implicitly constructed from a double example), but 
I don't agree that this should be a policy. The reason will be discussed 
below where it comes up again.
> In general, I see four separate, policy-controlled,
> aspects:
I try to avoid policy-based designs when possible, because they introduce a 
large amount of variance in the semantics of a single component. I'll discuss 
each of the four potential policies below:
> 1) Storage: (i) boost::any (ii) on-stack
When does one prefer the boost::any storage method instead of the on-stack 
method? The on-stack approach is a lightweight approach that closely 
resembles C-style unions; in the cases where variant objects are too large to 
put on the stack, the variant objects can easily be allocated on the heap 
(just like we would do with a C-style union). The only cases where on-stack 
allocation isn't feasible occur with recursive or incomplete types (discussed 
later...). So, I would say that we should only use on-stack allocation but 
allow an 'escape' to heap allocation for recursion and incomplete types.
> 2) Automatic casting: (i) never, (ii) cast to the
> first type possible, (iii) cast if only one possible
> type found
How about:
  (iv) use overload resolution to choose the best type, or fail if no best 
type exists
Again, I don't see a need for a policy here. Overload resolution is a C++ 
feature that every C++ programmer is aware of and it seems the natural 
solution for this component. Here are the reasons why I reject the other 
three:
  (i) implicit conversion is a feature of C++. Suppressing implicit 
conversions should be performed by user types, not by the variant, because 
the set of implicit conversions for a type is a property of the type itself, 
not a container of that type.
  (ii) this is very unnatural. You could end up with something like:
variant<int, float> v;
v = 3.14159; // v stores the integer 3
  (iii) this is reasonable, but again: overload resolution is already 
understood as a mechanism for finding the best alternative, so why not reuse 
it?
Side note: the desire for overload resolution is why I am unsure of a purely 
typelist-based approach. Getting all of the right overload candidates there 
given just a typelist is horrific.
  
> 3) Assignmenmt error: (i) Compile-time error, (ii)
> initialize with default value of first type
I don't understand why (ii) would ever be desirable. We should be checking for 
errors as early as possible.
> The fourth aspect relates to the issue of variant to
> variant assigmnet:
>    Variant<List1> u1;
>    Variant<List2> u2;
>    ...
>    u1 = u2;  //Problem(?): u2's held type may be
> invalid
>              //            for assignment into u1
>
>
> 4) Variant to variant assigmnet error: (i)
> Compile-time error if List2 is not a sub set of List1.
> (ii) Run-time error (iii) assign the default value of
> the first type on List1
How about:
  (iv) compile-time error if for any of the types of the source variant there 
is no 'best candidate' type in the target variant. So something like this 
should be fine:
  variant<int, float, double> v1;
  variant<long, double> v2;
  v1 = v2; // okay
  
Again, I can't see a reason for using (ii) or (iii) because we should be 
checking for errors as soon as possible. The behavior of (iii) could drive 
one positively mad during a late-night debugging session. (i) is okay, but 
overly strict if we want implicit conversions (I do).
>
> [snip]
>
> >     * Recursive types should be allowed, i.e., a
>
> variant can hold a value of a
>
> > type that is constructed using the variant type
> >     * Incomplete types should be allowed.
>
> Can you clarify on these two points?
I'll try to clarify by example. The canonical example for the use of recursive 
types in variants is an expression tree. In an expression tree, each 
expression node can be, for instance, a literal value (e.g., '2'), a variable 
(e.g., 'x'), or an operation on other expressions (e.g., 'expression + 
expression'). The last of these requires the expression type to be recursive, 
because the types it can store depend on the expression type itself. If we 
were using C-style unions, we might represent an expression like this:
struct expr {
  enum { et_literal, et_variable, et_plus, et_negate } type;
  union {
    int literal;
    variable_t* variable;
    std::pair<expr*, expr*> plus;
    expr* negate;
  } value;
};
We'd like to do the same with our generic discriminated union type. I 
personally aspire to have a variant type that is similar in expressive power 
to that of ML, where the above might be written as:
type expr =
  | Int of int
  | Var of variable
  | Plus of expr * expr
  | Negate of expr
  
>From the implementation point of view, recursive types aren't hard to deal 
with: just don't try to allocate them on the stack. The complexity is stating 
the recursion when using the variant type. My personal preference is to use a 
pseudo-keyword 'rec', e.g.,
  typedef variant<int, variable_t*, Plus<rec, rec>, Negate<rec> > expr;
As for incomplete types, refer back to the C-style union version of 'expr'. 
The variable_t type doesn't need to be defined because we're only holding a 
pointer to it. If we could specify that a particular type is incomplete 
(therefore requesting heap allocation), then we could easily use incomplete 
types with the variant. It might look like this:
  typedef variant<int, incomplete<variable_t>, Plus<rec, rec>, Negate<rec> > 
expr;
> >   - The visitor is fine, but there are simpler
>
> syntactic constructs that could
>
> > be used. For instance, instead of 'visit_at' just
>
> use the function call
>
> > operator.
>
> The function call operator cannot be static, which is
> somewhat restrictive.
How so? Instead of a static function call operator, just default-construct an 
empty class.
> > Also, the 'Visitor' class can be an implementation
>
> detail with a
>
> > (more natural?) syntax like:
> >   Super_union<...> u0;
> >   // ...
> >   u0.visit(size_visit());
>
> This syntax is indeed more natural, when the visit
> policy is stateless. However, if we make go() to
> return *this, we end up with a very clean way for
> applying the operation to multiple objects:
>
>    typedef Super_union<The_list> Concrete_union;
>    Concrete_union::Visitor<size_visit>::type
> size_visitor;
>
>    // ...
>
>    int result =
> size_visitor.go(u0).go(u1).go(u2).total_;
Have you needed this type of behavior before? I can't come up with any 
scenarios where it would help. Does it make the code shorter or more obvious? 
Just the calls to 'go' don't tell the whole story, because there is a 
significant amount of overhead (in C++ code, not at run-time) with this style 
of visitor: the visitor's 'visit_at' routine must be templated over the 
variant type and the 'Visitor' type generator must be used. Does the benefit 
of easy application to multiple variant objects outweigh the more complicated 
interface? If yes, there is a second part: does this benefit also outweigh 
the cost of create a new type of interface (visit_at and the Visitor type 
generator) instead of an interface based on a well-understood concept (a 
function object)?
Following is a partial sketch of the interface I have in mind. I've omitted 
details that we haven't really discussed yet or that I haven't thought about 
long enough to have a strong opinion on (tagged types, recursive types, 
stating type incompleteness, etc.).
        Doug
template<typename T1, typename T2 = unused2, ..., typename TN = unusedN>
class variant 
{
public:
  variant(); // default-construct value of type T1
  variant(const T1& t1); // assign value t1
  variant(const T2& t2); // assign value t2
  // ...
  variant(const TN& tN); // assign value tN
  ~variant();
  void swap(variant& other); // swap values
  variant& operator=(const T1& t1); // assign value t1
  variant& operator=(const T1& t1); // assign value t1
  // ...
  variant& operator=(const TN& tN); // assign value tN
  variant& operator=(const variant& other); // assign value from other
  
  // semantics described above
  template<typename OT1, typename OT2, ..., typename OTN>
  variant& operator=(const variant<OT1, OT2, ..., OTN>& other);
  const std::type_info& type() const; // typeid(type of current value)
  bool empty() const { return false; } // boost::any compatibility
};
struct bad_variant_cast : public std::runtime_error { /* ... */ };
// cast to type T if current value is of type T; otherwise, throw 
// bad_variant_cast
template<typename T, typename T1, typename T2, ..., typename TN>
  T& variant_cast(variant<T1, T2, ..., TN>& v); 
template<typename T, typename T1, typename T2, ..., typename TN>
  const T& variant_cast(const variant<T1, T2, ..., TN>& v);