$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Jerker Öhman (jerker.ohman_at_[hidden])
Date: 2007-01-08 14:33:33
We use iostreams::stream to read from a legacy file format. The code 
that reads from the stream looks a little like this:
void ReadStuff(std::istream &stream,
                Stuff &stuff,
                OtherStuff &otherStuff)
{
    StreamDirectory streamDir;
    stream >> streamDir;
    stream.seekg(streamDir.GetOffset(STUFF_TAG));
    stream >> stuff;
    stream.seekg(streamDir.GetOffset(OTHER_STUFF_TAG);
    stream >> otherStuff;
}
I.e. the stream contains a directory that holds information about the 
items following the directory and the offsets to them. The stream and 
code is also organized so that the calls to seekg in most cases are to 
the current position of the stream. In other words, seekg doesnt have 
to do anything.
If the stream is buffered the code runs into serious performance 
problems. After a while seekg ends up in this function.
template<typename T, typename Tr, typename Alloc, typename Mode>
typename indirect_streambuf<T, Tr, Alloc, Mode>::pos_type
indirect_streambuf<T, Tr, Alloc, Mode>::seek_impl
     (stream_offset off, BOOST_IOS::seekdir way, BOOST_IOS::openmode which)
{
     if (pptr() != 0)
         this->BOOST_IOSTREAMS_PUBSYNC(); // sync() confuses VisualAge 6.
     if (way == BOOST_IOS::cur && gptr())
         off -= static_cast<off_type>(egptr() - gptr());
     setg(0, 0, 0);
     setp(0, 0);
     return obj().seek(off, way, which, next_);
}
and as far as I can see it just dumps the internal buffer and passes the 
call on. After still some calls we end up in the stream source seek 
function. This wouldnt be so bad if it wasnt for the fact that the 
stream source offset isnt the same as the streams offset since the 
stream is buffered.
Perhaps a little example would clarify this. Assume that we read 4 bytes 
from a stream with a 10k buffer. The stream will then fill its buffer 
from the underlying source which means that after the read, the stream 
will have a full 10k buffer and an offset into the buffer that is 4. The 
underlying stream source will have an offset pointer that points to 
10k+1 bytes into the underlying stream.  If we now call seekg to 
position the file to offset 4 (which already is the current position), 
the stream throws away its buffer and we end up in the stream source 
whos file position is 10k+1 so it also throws away its internal 
buffers and seeks back in the underlying stream to offset 4. In this 
way, what should have been a NULL operation turns into something very 
time-consuming.
Previously we used a class that inherited directly from 
std::basic_streambuff that contained horrible code that no one really 
understood so switching to boost::iostreams was a blessing. 
Unfortunately the boost::iostreams implementation is 10 times slower 
when it is buffered and 50% slower when it is unbuffered. When reading 
files that are a couple of GB that really matters.
/Jerker