$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Oleg Abrosimov (beholder_at_[hidden])
Date: 2006-05-17 20:55:06
n1803/n1982 Simple Numeric Access proposal is broken
Ive implemented long stol(string& str, int base = 10) function from 
n1982 proposal (see code at the bottom of this message) and faced some 
issues:
issue 1: non-const version (string& str) breaks code like
long l = stol("10");
usage is limited to:
string s = "10";
long l = stol(s);
issue 2: "function call strtol(str.c_str(), 0, base)" requirement can 
not be fulfilled.
it conflicts with "erases the characters from the front of str that were 
converted to get the result" and with "Throws: invalid_argument if 
strtol, strtoul, strtoll, or strtoull reports that no conversion could 
be performed"
requirements (error checking prevents usage of '0' for 2nd argument)
issue 3: simple performance test shows that the function proposed is 
1000 times slower compared to iostreams solution (number of samples for 
test were 100000)
Platform: WinXP VC71 STLPort 4.6.2
all these issues (except for 2) could be solved by removing the "... 
erase the characters from the front of str that were converted to get 
the result. ..." requirement.
issue 2 has a trivial solution: replace "function call 
strtol(str.c_str(), 0, base)" with "function call strtol with 
appropriate arguments".
the fixed code is given in long stol_fixed(string const& str, int base 
= 10) function.
this stol_fixed function is less powerful than the originally version 
proposed (if we forgot about issues mentioned). it disallows parsing of 
multiple numbers from one long string. the C-librarys strtol function 
allows it but if all error conditions should be checked the code becomes 
too complex for simple parsing task.
On the other side, all issues with the stol function proposed comes from 
an attempt to merge in one simple interface both needs: (1) parsing of 
multiple numbers from one string; (2) parsing one number from string.
The (2) can be accomplished with simple function interface, but the (1) 
should keep a state (pointer to beginning of a char-sequence to parse 
from). In the n1982 proposal it was established be erasing parsed 
characters, but it is too expensive, because of memory [de]allocations 
caused.
It seems that stream-like interface would be good for the (1), for example:
std::string s(10 20 30 40);
cistream cs(s);
long l;
cs >> l;
int i;
cs >> i;
long arr[10];
for(int i = 0; i < 10; ++i) {
   cs >> arr[i]; // exception would be thrown (read after eof)
}
or, better:
std::string s(10 20 30 40);
cistream cs(s);
long l = cs.read_long(); //convenience function
int i = cs.read_int(); //convenience function
long arr[10];
for(int i = 0; i < 10; ++i) {
   cs >> arr[i]; // exception would be thrown (read after eof)
}
This solution would support:
cistream, costream, ciostream
wcistream, wcostream, wciostream
One more issue with n1982 proposal and the one Ive proposed in this 
group is that both make invisible for C++ programmer that he uses 
wrappers around C-library I/O functions, that are incompatible with C++ 
locales. The solution would be to code this in components names (like 
the c symbol in names above).
NOTE: with from_string function that was proposed earlier by me
the parsing one number from string task with use of C-library 
functions can be implemented as from_string_c function.
It can not be safely merged with from_string function, because of C vs. 
C++ locale issues. Same applies to (w)string_from function. its 
C-wrapper counterpart would be (w)string_from_c function. Better naming 
suggestions are welcome.
(_byc suffix?)
Best,
Oleg Abrosimov.
// code begins
#include <locale>
#include <cstdlib>
#include <cerrno>
#include <string>
// 1) "12 23 34 34 56 78" - stream-like interface is the best in C++
// 2) " 12" - conversion function is appropriate
namespace std { namespace tr2 {
   // issue 1: non-const version breaks code like
   // "long l = stol("10");"
   // usage is limited to:
   // string s = "1 2";
   // long l = stol(s);
   // long l1 = stol(s);
   //long stol(string const& str, int base = 10)
   long stol(string& str, int base = 10)
   {
       char* endptr = 0;
       const char* nptr = str.c_str();
       long res = strtol(nptr, &endptr, base);
       if (endptr == nptr) {
           throw std::invalid_argument("stol invalid argument. str = '" 
+ str + "'");
       }
       if (errno == ERANGE) {
           switch(res) {
               case LONG_MAX :
                   throw std::overflow_error("stol overflow. str = '" + 
str + "'");
                   break;
               case LONG_MIN :
                   throw std::underflow_error("stol underflow. str = '" 
+ str + "'");
                   break;
           }
       }
       // performance killer !!!!!!!
       str.erase(0, endptr - nptr);
       return res;
   }
   // issue 2: "function call strtol(str.c_str(), 0, base)" requirement 
can not be fulfilled
   // it conflicts with "erases the characters from the front of str 
that were converted to get the result"
   // and with "Throws: invalid_argument if strtol, strtoul, strtoll, or 
strtoull reports that no conversion could be performed"
   // requirements (error cheking prevents usage of '0' for 2nd argument)
   long stol_fixed(string const& str, int base = 10)
   {
       char* endptr = 0;
       const char* nptr = str.c_str();
       long res = strtol(nptr, &endptr, base);
       if (endptr == nptr) {
           throw std::invalid_argument("stol invalid argument. str = '" 
+ str + "'");
       }
       if (errno == ERANGE) {
           switch(res) {
               case LONG_MAX :
                   throw std::overflow_error("stol overflow. str = '" + 
str + "'");
                   break;
               case LONG_MIN :
                   throw std::underflow_error("stol underflow. str = '" 
+ str + "'");
                   break;
           }
       }
       return res;
   }
}}
// performance test code begins
// it uses profiler code by Christopher Diggins
#include <iostream>
#include <sstream>
#include <limits>
#include <vector>
#include <boost/profiler.hpp>
#include <boost/lexical_cast.hpp>
#ifdef max
#undef max
#endif
#ifdef min
#undef min
#endif
// 1) create a string of long values
// 2) read em using std::stringstream
// 3) same with stol()
int main()
try {
   // 1) create a string of long values
   const long lMax = std::numeric_limits<long>::max();
   const long lCount = 1000000L;
   const long lMin = std::numeric_limits<long>::max() - lCount;
   std::string sLongs;
   std::vector<std::string> vecLongs;
   vecLongs.reserve(lCount);
   for (long l = lMax; l > lMin; --l) {
       std::string s = boost::lexical_cast<std::string>(l);
       sLongs += (s + '\t');
       vecLongs.push_back(s);
   }
   sLongs = sLongs.substr(0, sLongs.length() - 1);
   // 2) read em using std::stringstream
   {
       boost::prof::profiler p(": read em using std::stringstream");
       std::istringstream ss(sLongs);
       long l;
       volatile vl;
       while (!sLongs.empty() && ss.good() && !ss.eof()) {
           ss >> l;
           volatile const char* nptr = sLongs.c_str();
           // uncomment it to simulate std::tr2::stol timings
           //sLongs.erase(0, 11);
           vl = l;
       }
   }
   // 3) same with stol_fixed()
   // 3.6 times faster then (2)
   {
       boost::prof::profiler p("same with stol_fixed()");
       long l;
       volatile vl;
       for(long i = 0; !sLongs.empty() && i < lCount; ++i) {
           l = std::tr2::stol_fixed(vecLongs[i]);
           vl = l;
       }
   }
   // 4) same with stol()
   // 1000 times slower then iostreams solution
   {
       boost::prof::profiler p("same with stol()");
       long l;
       volatile vl;
       while (!sLongs.empty()) {
           l = std::tr2::stol(sLongs);
           vl = l;
       }
   }
} catch (std::exception& ex) {
   std::cerr << ex.what() << std::endl;
} catch (...) {
   std::cerr << "Unknown exception occured" << std::endl;
}