$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Andreas Huber (ahd6974-spamgroupstrap_at_[hidden])
Date: 2005-03-12 09:25:33
Jonathan Turkanis wrote:
>> Do I really say this? I don't see any place where I say or imply that
>> something can't be done because *I* couldn't think of a solution.
>
> Of course not. Otherwise I wouldn't have had to make up the obviously
> fake quote. Consider: "Quite a bit of effort has gone into making the
> library fast for small simple machines and scaleable at the same time
> ... the scalability does not come for free." I naturally assumed it
> was your effort you were talking about.
Yes, but I don't see how one can *imply* from the above that it can't be
done because *I* haven't found ways to do it. Anyway, this is rather OT
and I won't argue any more about this.
> I'm just arguing that the Rationale doesn't adequately explain why
> better performance is not possible. I'm willing to be convinced that
> it isn't possible, and that you've made the best possible effort to
> find more efficient alternatives.
Define best possible effort. How much time do you think should I spend
on this?
>>> - Two-dimensional array of function pointers. This explanation makes
>>> no sense to me, even as supplemented by Andreas's reply to Iain
>>> Hanson this morning. It's common for a composite component to make
>>> use of several components implemented in different TUs. For
>>> instance, the composite commonent may "know about" the other
>>> components by derivation, by containment, typelists, etc. When it
>>> needs to assemble the array of pointers, it can simply ask each of
>>> them for their contribution.
>>
>> No, in a nutshell it cannot because it doesn't know how many or what
>> components exist.
>
> I'm not talking about the proposed library here.
Neither do I, I'm talking about any library satisfying the 9
requirements.
> I accept that it's
> not possible in your implementation.
Noted.
>> Of course, there must be links between the
>> components
>> but C++ does not allow you to follow these links in a useful way and
>> there is no way to force the components in the TUs to register
>> themselves. The only way you can guarantee the registration is by
>> requiring the programmer to register all the parts explicitly, which
>> is contrary to the scalability requirements.
>
> You frequently mention "manually registering" components. It would be
> helpful if you would explain what you mean by this. For example, a
> speciailization of std::pair has two components, first_type and
> second_type. When I write
>
> typedef std::pair<int, float> pair_type;
>
> does this count as "manually registering" the components of
> pair_type, since I have to know what they are and write them down?
I'm inclined to answer yes but I'm not sure whether this example isn't
too far from what we're currently discussing.
> Lots of C++ libraries allow users to construct composite components
> out of components defined in different TUs.
I think it is worth noting that "usual" C++ components consist of a
declaration and a definition. In the example above, if int and float
were user-defined types (classes), you'd need to include their header
files. In the context of this discussion this does *not* count as
multiple TUs. That's because I don't see how such a separation is
useful/feasible for states (as always: suggestions are welcome), i.e.
state declaration and definition are more or less ineviatbly in one
place.
> I'm not trying to be
> stubborn; I really don't understand why FSM is special in this regard.
See above.
>> Looking at the code above we notice several things:
>>
>> 1. Visibility: In the proposed FSM lib, an outer state "knows" only
>> its inner initial state (which can be an incomplete type) but none
>> of its other inner states (Active does not know that Running exists,
>> neither
>> does StopWatch). An inner state knows all its outer states. Now, that
>> it is done this way and not the other way round (outer states know
>> all their inner states but inner states don't know their outer
>> states) is
>> not a coincidence for the simple reason that the other way wouldn't
>> leave any room for parts in multiple TUs. I also think this is pretty
>> much the only sensible way how you can spread an FSM over multiple
>> TUs. I can't prove this (for me this is just obvious).
>
> Here you've come quite close to saying: "The requirements are X; I
> tried hard to satisfy X ...but I couldn't; therefore, it can't be
> done."
Define "quite close". I'm saying that *I* don't see other ways (you're
welcome to show such a way). I'm am *not* saying that it can't be done.
As mentioned above, I don't think such word games are very productive. I
won't argue any longer about this.
>> So, if you have any
>> objections be sure to attach a proposal on how you would select the
>> visibility to enable spreading over multiple TUs.
>
> Please -- I acknowledged at the beginning of my review that I hadn't
> had time to thoroughly examine the implementation and suggest
> improvements.
The visibility discussion is purely about the interface, it has nothing
to do with the implementation.
> But I do have a little experience in C++, and I wanted
> to point out what I saw as gaps in your rationale.
That's fine with me.
>> 2. Connections: The proposed FSM lib connects the parts at link time,
>> through a call to the react function. This works fine but does leave
>> you
>> with the "problem" that there is no way to have a global view at your
>> FSM, *not* *even* *at* *runtime*. When the FSM still resides inside
>> Stopped you don't have *any* way of knowing that Running even exists.
>> Likewise, when the transition to Running has been made, you cannot
>> possibly know whether there are other states because Running might
>> itself define similar custom_reactions that lead you to other states
>> when the reactions are triggered. Note that the custom reaction
>> cannot possibly have any knowledge about the target state as you
>> then can no longer hide Running in a separate TU.
>
> Here you're just describing your particular implementation.
Yes.
>> Now, let us consider other possibilities how Running could be
>> connected
>> to Stopped (I'm neither assuming a particular lib interface nor a
>> particular lib implementation, the only thing that I assume is that
>> we have an FSM employing a hypothetical lib with the same states
>> Active, Stopped, Running spread over the TUs Running.cpp and
>> Main.cpp):
>
> Good, now you're going to consider alternatives.
It seems you haven't noticed that I did that in the Visibility
discussion too.
>> a) At compile time only: I don't see how that could work: Running.cpp
>> does not have any knowledge about Main.cpp and vice versa.
>
> As far as I'm concerned, you've just dismissed this out of hand
> because you can't see how to do it.
NO, I haven't dismissed that possiblity I've just stated a fact, namely
that *I* don't see how it could work. This also falls into the word
games category and I won't argue any more about this.
> Possibly the joints between parts of machines defined in separate TUs
> must be weaker (i.e., less known at compile time) than the
> connections between collections of states implemented in a single TU.
Yes, that occurred to me also. I don't see how I can make the link
weaker without sacrificing static checking (or other requirements). And
please: I'm NOT dismissing that possibility and concrete suggestions are
always *very* welcome.
>> b) At runtime only: State definition could contain a static instance
>> of a class, which, in its constructor, registers the state with its
>> outer state.
>
> This is well-known not to work, for the reasons you state.
>
>> c) Any combination of the above: I don't see how that could improve
>> things in any way
>
> If I was working on the library, I'd try a combination of your
> current technique and a)
Concrete suggestions welcome. Especially interesting would be a proof
that shows either of the following:
a) Different TUs can share (read/write) information at compile time
b) If a) isn't possible, how link-time connections can be "improved" by
CT information available from a single TU
I hope we agree that proving a) or b) is a lot less work than proving
that it can't be done.
>> 3. State-local storage: Consider what we need to do if we want to
>> give Running a new auxilary state variable, e.g. the startTime_
>> mentioned
>> in the tutorial. Because Running has its own storage, we can simply
>> add a member to its class and are done. We don't need to change
>> Active or
>> the StopWatch.
>> If the proposed FSM lib wouldn't provide storage for each state, it
>> would force people to change the state_machine each time they need to
>> introduce such a variable (or at least once when they introduce the
>> first variable).
>
> I never said you should omit state-local storage.
Agreed. I added this discussion because you are dissatisfied with the
overall performance and I think that state-local storage is usually the
main perf bottleneck.
>> Point 1 & 2 limit the maximum possible dispatch performance, more on
>> that later.
>> Point 3 limits the maximum possible transition performance, which
>> will typically be the main performance bottleneck.
>
>>> If there are order of initialization problems, the arrays can be
>>> allocated on first use.
>
>> No, that is not an option. Unless the allocation and the filling of
>> the array always take considerably less time than the reaction
>> itself, this would prohibit the use of such an FSM lib in real-time
>> applications. In real-time applications you mostly cannot afford to
>> do such lazy
>> init stuff, because that would raise the upper limit of the reaction
>> time to often unacceptable levels.
>
> You snipped the context.
The context is irrelevant, see below.
> Jonathan Turkanis wrote:
>> When it needs to assemble
>> the array of pointers, it can simply ask each of them for their
>> contribution. If there are order of initialization problems, the
>> arrays can be allocated on first use.
>
> The first use would be when the machine is constructing the table of
> pointers, which would presumably be when it is constructed or
> immediately after it is started.
How does the machine construct the the table of pointers when it doesn't
know how the full FSM layout looks like?
>>> In my experience, C++ provides extremely fine-grained control over
>>> which parts of a computation are performed at compile time and which
>>> at runtime. I have a strong suspicion that a FSM library could be
>>> written which satisfies the nine requirements and also provides
>>> ruthless efficiencies, at least in many common cases.
>>
>> I believe a library satisfying the 9 requirements can at best provide
>> O(n) dispatch for an FSM using all the features, where n is the
>> number of currently active orthogonal regions. So, if you don't use
>> orthogonal states you get constant time dispatch. The reason lies in
>> the fact that, as I hope I have shown above,
>
> Sorry, but no.
Too bad, it's the best I can do. As I said, it is much more difficult to
prove that a certain design is the best possible than showing with a
concrete proposal that an improvement is possible. I for one don't think
that spending any more effort on the former is justified.
>> you cannot know how your
>> whole FSM looks. Dispatch must always begin with a currently active
>> innermost state and consider its reactions first. Only if there is no
>> suitable reaction for the innermost state, the next outer state can
>> be considered and so on until an outermost state is reached. Now,
>> since the innermost state knows all its outer states at compile time,
>> it could collapse all the reactions of the outer states into a single
>> STT, yielding constant time lookup for each innermost state
>> *separately*. Hence, if there are n innermost states active at the
>> same time you have O(n) lookup. In any case I don't expect dispatch
>> to be the main performance bottleneck in a typical FSM, see above.
>
>>> It may turn out
>>> to be necessary to sacrifice performance for some FSMs, but I
>>> believe I hybrid approach may be possible which uses fast dispatch
>>> in the majority of cases. I'm willing to be convinced that it is
>>> impossible, but I haven't seen a satisfying explanation yet.
>
>> A hybrid approach could indeed be possible, but I don't think it can
>> be done without requiring help from the user (i.e. complicating the
>> interface) or constraining the functionality.
>
> Have you tried?
No, because my instincts say the opposite of what yours say.
> My instinct says it can be done.
Suggestions welcome.
>> For example you could
>> require the user to somehow (at CT or RT) register all the possible
>> states of a machine or you could prohibit custom_reactions (i.e.
>> require the implementation of the whole FSM in one TU).
>
> Again, I don't know what you mean by "registering all possible
> states". If it just means listing them in a typelist, for instance, I
Yes, that is one possibility.
> don't see why this can't be done separately for each TU that contains
> part of the state machine implementation. The definition of the whole
> machine doesn't necessarily have to enumerate all the states.
You seem to be ignoring orthogonal states.
> And I'm not suggesting you prohibit custom reactions.
Noted.
>> Both would
>> give you O(1) lookup. But let's not forget that lookup is typically
>> not the main performance bottleneck! Neither of these measures would
>> speed up transition execution.
>
> As far I can see, you have not explained why state local storage is
> expensive, even in your implementation. Is it dynamic allocation?
No.
> Why
> not construct small states into a properly aligned buffer?
You can do that with the current version by overloading operator new of
each state.
Construction is inefficient because that involves various book-keeping
tasks. I think it is best if you have a look at the code.
>> Transition execution can only be sped up by prohibiting state-local
>> storage, i.e. by further constraining the functionality.
>> "Why not?"
>> you might ask.
>
> Of course.
>
>> It seems that this can only be done if you complicate
>> the interface for everyone, including the users who need the full
>> functionality. A user requiring state- local storage would be forced
>> to somehow associate a state idenitifier with a type, which she can
>> then use as a container to hold the auxillary variables.
>
> I can't see why this would be necessary. Honestly, I really don't
> understand what you're talking about here. As I said at the beginning
> I accept the requirements, including state local storage.
See above, because it is the main performance hog.
>> I guess here probably lies our main disagreement: For me the users
>> requiring the full functionality have the priority. Why? Because I
>> think that in a certain class of applications most non-trivial FSMs
>> will hugely benefit from state-local storage (at least some users
>> have confirmed that).
>
> Nothing I said in my review implied that you should sacrifice
> features for performance; all I want is a detailed explanation why
> this would be necessary.
Ok, it was just a guess.
> Others have suggested that the library might be accepted but renamed
> to indicate that it did not pretend to be the definitive C++
> implementation of FSMs. For instance, the proposed library would be
> "Boost.UMLStatecharts," while a separate library might cover faster
> state machines with fewer features.
>
> I'm against this unless it can be shown that the better perfomance
> can only be obtained by sacrificing functionality. I want to emphase
> again that I phrased my review as an objection to the Rationale
> section of the documentation.
As I said, I can't explain it any better than I already did. This
doesn't mean that I won't answer questions.
I have to accept the consequences.
>> To conclude the above:
>
>> 2. I agree that it is theoretically possible to implement a hybrid
>> library that supports both classes, but *only* at the cost of a much
>> more complex interface *and* a much more complex implementation. I
>> would even go as far as saying that two separate libraries optimized
>> for each class would have very little in common (interface or
>> implementation). I don't see any benefits in a hybrid approach other
>> than being able to say: "This is *the* solution for all your FSM
>> problems". And it could very well turn out that the hybrid interface
>> is so complex that users would not consider using it.
>
> My guess is that a smooth interface might be possible.
A concrete proposal is welcome.
>>> - The restriction on event deferral -- one must pass an event
>>> pointed to by an instance of intrusive_ptr -- is extremely
>>> inelegant:
>>>
>>> int main()
>>> {
>>> myMachine.process_event(
>>> *boost::intrusive_ptr<Event>( new Event() )
>>> );
>>> }
>>>
>>> I'm sure there is a good reason for this restriction, but there must
>>> be a cleaner way to accomplish the same thing.
>>
>> Have you read the rationale for this?
>>
>> http://tinyurl.com/6xbo2 (Limitations, Deferring and posting events)
>
> You don't explain why a pointer to a dynamically allocated event
> can't be passed.
I could offer such an interface, but that would leave users guessing who
will be deleting the event after consumption.
>>> - I don't understand why it is necessary to pass a type as a
>>> length-one mpl::list simply because it is a class template
>>> instantiation. The docs say it is "due to some template-related
>>> implementation details", but this sounds fishy to me.
>>
>> This is the example in the docs:
>>
>> struct NotShooting : fsm::simple_state< NotShooting, Camera,
>> /* ... */, mpl::list< fsm::deep_history< Idle > >,
>> fsm::has_deep_history >
>> {
>> // ...
>> };
>>
>> Internally, I use mpl::is_sequence to find out whether we already
>> have a list of inner initial states. If there's none I need to make
>> such a list. At the point of definition of NotShooting, Idle is an
>> incomplete type. Now the problem is that mpl::is_sequence triggers an
>> instantiation of the deep_history template and I have not found a way
>> how I can prevent access to Idle inside that template. When you wrap
>> that in a list the deep_history template is not instantiated and Idle
>> is not accessed. I'm sure there is a solution but the problem simply
>> never made it to the top of my to-do list.
>
> Perhaps you could automatically wrap the template argument at that
> position in a sequence-wrapper which turns most types into length-one
> sequences but is specialized for mpl::list to pass through the
> intrinsic operations.
That might work, yes.
Regards,
-- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.