From: Justin McManus (justin_at_[hidden])
Date: 2020-09-05 18:23:36


I have some code that works as intended, but it requires setting a
buffer_size parameter to zero on a std::ifstream pushed onto a filtering
chain, and I'd like to understand why, to ensure I'm not introducing a bug
or a hack.

I have essentially the following code:
--------------------------------------------------------------------------------------------------------
std::ifstream m_jf("json_filename", std::ios_base::in |
std::ios_base::binary);
std::locale utf8_locale("en_US.UTF-8");
m_jf.imbue(utf8_locale);

boost::iostreams::filtering_istream m_inbuf;
m_inbuf.push(boost::iostreams::bzip2_decompressor());
m_inbuf.push(m_jf);

std::string m_line;
while (std::getline(m_inbuf, m_line)) {
  // Process the current line from the JSON file
}
--------------------------------------------------------------------------------------------------------

What I find is that the std::getline call will fail before the code has
reached the EOF. It will always fail at the same line in a given JSON file,
but it will fail on different lines in different JSON files. It's perfectly
reproducible.

However, if I change lines 4 and 5 to
     m_inbuf.push(boost::iostreams::bzip2_decompressor(), *0*);
     m_inbuf.push(m_jf, *0*);
then the problem goes away.

My question is, Why does setting the buffer_size parameter to zero solve
the issue? What does this do, exactly? I saw the suggestion to set the
buffer size this way from an old post in 2009, and it appears to work, but
I'd like a deeper understanding of what's happening under the hood. If the
buffer size is set to zero, what does the underlying implementation do, and
how might this influence whether std::getline fails before the EOF?

Thanks very much,
Justin

-- 
Justin McManus, Ph.D.
Principal Scientist
Lead Computational Biologist and Statistical Geneticist
Kallyope, Inc.
430 East 29th Street, Suite 1050
New York, NY 10016