$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] Push/pull parsers & coroutines
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2017-10-14 19:03:34
Vinnie Falco wrote:
> On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
> <boost_at_[hidden]> wrote:
>> A "push" parser,
>> which invokes client callbacks as tokens are processed, is easier to
>> implement but harder to use as the client has to track its state
>> between callbacks with e.g. an explicit FSM. On the other hand, a
>> "pull parser" (possibly using an iterator interface) is easier for
>> the client but instead now the parser may need the explicit state
>> tracking.
>
> That is generally true, and especially true for XML and other
> languages that have a similar structure. Specifically, that there are
> opening and closing tags which determine the validity of subsequent
> grammar, and have a recursive structure (like HTML).
>
> But this is not the case for HTTP. There are no opening and closing
> tags. There is no need to keep a "stack" of "open tags". It is quite
> straightforward. Therefore, when designing an HTTP parser we can place
> less emphasis on the style of parser and instead focus those energies
> to other considerations (as I described in my previous post, regarding
> the separation of concerns for stream algorithms and parser
> consumers).
>
> If you look at the Beast parser derived class, you can see that the
> state is quite minimal:
>
> template<bool isRequest, class Body, class Allocator>
> class parser
> : public basic_parser<isRequest, parser<isRequest, Body, Allocator>>
> {
> message<isRequest, Body, basic_fields<Allocator>> m_;
> typename Body::writer wr_;
> bool wr_inited_ = false;
> std::function<...> cb_h_; // for manual chunking
> std::function<...> cb_b_; // for manual chunking
> ...
You still have an explicit state machine, i.e. a state enum and a overview.html
switch statement in a loop; I'm looking at impl/basic_parser.ipp for
example.
But I don't want to dwell on this particular code. I'm just considering,
generally, whether this style of code is soon going to look "antique" -
in the way that 15-year-old code full of explicit new and delete looks
antediluvian now that we're all using smart pointers.
I think it's clear that often coroutines can make the code simpler to
write and/or easier to use. The question is what do we lose. The
issue of generator<T> providing only input iterators is the most
significant issue I've spotted so far. This is in some way related
to the whole ASIO "buffer sequence" thing; the code I posted before
read into contiguous buffers, but that was lost before the downstream
code saw it, so it couldn't hope to optimise with e.g. word-sized
copies or compares. Maybe this could be fixed with some sort of segmented
iterator, or something other than generator<T> as the coroutine type,
or something. Or maybe it's unfixable.
Do other languages have anything to teach us about this? What do
users of Boost.Coroutine think?
Regards, Phil.