$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
From: Daryle Walker (darylew_at_[hidden])
Date: 2008-08-14 19:58:44
On Aug 14, 2008, at 1:48 AM, Robert Ramey wrote:
> Daryle Walker wrote:
>> I was thinking about adding serialization to some times I've been
>> working on in the sandbox.  First I tried to recall how Mr. Ramey
>> said serialization can be tested.  I couldn't find the specific post
>> I was thinking about, but others that were found gave me the answer.
>> Reading other posts in that search prompted me to ask more questions.
>>
>> I could reduce the classes I'm working with to:
>>
>> //=============================================
>> class computer;
>>
>> class context
>> {
>> public:
>>     typedef boost::array<uint_least32_t, 4>  value_type;
>>
>>     context();  // use auto copy-ctr, copy-=, dtr
>>
>>     void        operator ()( bool );  // consumer
>>     bool        operator ==() const;  // equals
>>     bool        operator !=() const;  // not-equals
>>     value_type  operator ()() const;  // producer
>>
>> private:
>>     friend class computer;
>>
>>     boost::uint_fast64_t            length;
>>     boost::array<uint_fast32_t, 4>  buffer;
>>     boost::array<bool, 512>         queue;
>>
>>     template < class Archive >
>>     void  serialize( Archive &ar, const unsigned int version );
>> };
>>
>> class computer
>>     : public convenience_methods_base<context>
>> {
>>     // An object of type "context" is incorporated in this object
>>     // due to the base class.  A mutable/const pair of non-static
>>     // member functions named "context()" gives access to the inner
>>     // context object.
>>
>> public:
>>     typedef context::value_type  value_type;
>>
>>     // Put various access member functions here that forward to the
>>     // internals of the "context" type, which work because of the
>>     // friend declaration.
>>
>> private:
>>     template < class Archive >
>>     void  serialize( Archive &ar, const unsigned int version );
>> };
>> //=============================================
>>
>> I initially planned to have serialization functions for these two
>> classes, the "convenience_methods_base" base class template, plus two
>> other class templates (a base class and a support class) that
>> "convenience_methods_base" uses.  But the e-mail search I mentioned
>> found a thread from May 2007 (on the main Boost list) the suggested
>> that the serialization of a non-primitive should match the user's
>> external representation of the type, and not the type's particular
>> internal structure.  So I decided to keep the serialization protocol
>> just for the two public-facing classes, "context" and "computer."
>
> I don't think I've ever said anything like this - at least not
> intentionally.
It wasn't you.  The first post of the (sub)thread I mentioned is  
"[boost] Serialization support, Was:  [BoostCon07][testing] Session  
reminder." by Peter Dimov on 2007-May-4 at 6:38 PM.
> The bedrock of the serialization is the composition of serialization
> function call which reflects the underlying composition of the data
> items into classes or types.
>
> In a couple of  very unusual cases, shared_ptr is the canonical
> example, this composition is not possible.  But I would emphasize
> that these are unusual to the point of being pathological.
>
> So in your case, I would just make each type serializable
> in terms of its components.
What I'm trying to do is to compose the serialization from the  
perceived components, not what's actually there.  This makes it  
robust against implementation changes.
> Note that doing this will preserve the private nature of the
> serialization functions.  If they are made private and implemented
> as member functions, they become internal implementation
> details of the class.  So you don't compilicate or pollute
> your design by exposing implementation details in the
> public interface.
>
>>
>> I figured that the "computer" object can be serialized like:
>>
>> //=============================================
>> template < class Archive >
>> inline void  computer::serialize( Archive &ar, const unsigned int
>> version )
>> { ar & boost::serialization::make_nvp("context", this->context()); }
>
> This is not how I would do it.  I would
>
> template <class context,  class Archive >
> inline void  computer::serialize( Archive &ar, const unsigned int
> version ){
>     ar & BOOST_SERIALIZATION_BASE(convenience_methods_base<context>,
> version)
> }
>
> which would in turn eventually call the serialization implemented in
> convenience_methods_base<context>,
>
>> //=============================================
The base classes and any helper data within them are just  
scaffolding.  The context sub-object is the only important part.  I  
don't care about preserving the other sub-objects; those parts refer  
to each other or the context sub-object, and the referents are fixed  
at class-design or construction time; so on-the-fly saving and  
loading isn't needed.  Here's an example:
//=============================================
class sample
{
     int  a[4], &b;
     template < class Archive >
     void  serialize( Archive &ar, const unsigned int )
     { ar & BOOST_SERIALIZATION_NVP( a ); }
public:
     sample()  : a(), b( a[2] );
     // make custom copy-ctr and op=
};
//=============================================
I shouldn't have to serialize the "b" member because its setting is  
always fixed.  I can just serialize the "a" member.
Anyway, those support classes weren't always there, and they may  
change again (or be removed).  Why should I have to preserve them,  
and possibly update the serialization routine (and probably therefore  
the version number) of my public class when the bases change?  I'm  
already serializing the only run-time-mutable sub-object I, and the  
users, would care about.
Actually the implementation uses a public member function "context".   
(Well, it's part of the public base class.)  So I could move the  
routine to make the serialization for "computer" completely non- 
intrusive.
>> Which leaves how "context" objects are serialized.  After thinking
>> about it for hours, I decided to just whip out something quick &
>> dirty and refine it later.  So:
>>
>> //=============================================
>> template < class Archive >
>> inline void  context::serialize( Archive &ar, const unsigned int
>> version )
>> {
>>     ar & BOOST_SERIALIZATION_NVP( length )
>>        & BOOST_SERIALIZATION_NVP( buffer )
>>        & BOOST_SERIALIZATION_NVP( queue );
>> }
>
> Which looks fine by me.
>
>> //=============================================
>>
>> would give a final serialization, in my test file, of:
>>
>> //=============================================
>> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
>> <!DOCTYPE boost_serialization>
>> <boost_serialization signature="serialization::archive" version="5">
>> <test class_id="0" tracking_level="0" version="0">
>> <context class_id="1" tracking_level="0" version="0">
>> <length>1</length>
>> <buffer class_id="2" tracking_level="0" version="0">
>> <elems>
>> <count>4</count>
>> <item>1732584193</item>
>> <item>4023233417</item>
>> <item>2562383102</item>
>> <item>271733878</item>
>> </elems>
>> </buffer>
>> <queue class_id="3" tracking_level="0" version="0">
>> <elems>
>> <count>512</count>
>> <item>1</item>
>> <item>0</item>
>> <!-- I'll spare you, and the mail server, of 509 more "<item>0</
>> item>" lines -->
>> <item>0</item>
>> </elems>
>> </queue>
>> </context>
>> </test>
>> </boost_serialization>
>> //=============================================
>>
>> Now I started refining, keeping the principle of not leaking
>> implementation details in mind.  The problem here is the array-
>> counts, which I don't need since they'll never change.  The first one
>> I can fix by writing each element separately:
>
> This is an artifact of our implementation of serialization of  
> arrays in xml.
> The count of elements is in fact redundant for a fixed size array.  If
> you wanted to eliminate it, the most transparent way would be to
> define your own serialization for array and use that instead of the
> one included in the library.
Where would that override be written?  And my override can't mess up  
anyone else's use of boost::array.  How would it differ from what I'm  
currently doing?  (Would it be better?  And if so, why?)
>> //=============================================
>> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
>> <!DOCTYPE boost_serialization>
>> <boost_serialization signature="serialization::archive" version="5">
>> <test class_id="0" tracking_level="0" version="0">
>> <context class_id="1" tracking_level="0" version="0">
>> <length>1</length>
>> <buffer-A>1732584193</buffer-A>
>> <buffer-B>4023233417</buffer-B>
>> <buffer-C>2562383102</buffer-C>
>> <buffer-D>271733878</buffer-D>
>> <message-tail class_id="2" tracking_level="0" version="0">
>> <elems>
>> <count>512</count>
>> <item>1</item>
>> <item>0</item>
>> <!-- 509 more "<item>0</item>" lines -->
>> <item>0</item>
>> </elems>
>> </message-tail>
>> </context>
>> </test>
>> </boost_serialization>
>> //=============================================
>>
>> I've always wanted to use something like a base-64 string encoding of
>> the bit array, because it's cool and it'd save space.  I added
>> conversion functions to/from the bit array and a std::string, and
>> then (de)serialized the string.
>
> you might look in to "binary_object" which serializes its argument
> as a base64 text for text and xml archives and binary in binary  
> archives.
Hmm, I'm about to switch the implementation of the Boolean array from  
boost::array<bool, XXX> to boost::array<unsigned char, XXX/CHAR_BIT 
+1>, so "binary_object" may help.  I wonder what happens with a text  
or xml archive if the receiving computer has a different CHAR_BIT  
from the sending one, assuming I switch to an unsigned-char array and  
use binary_object.  I think that right now my base64 serialization  
will still work, since I force a particular bit order.  I would have  
to trust whatever bit-order binary_object uses.  (Can it be adjusted?)
>> I also had to separate "serialize"
>> into "save" and "load" since conversion is complementary, not
>> identical.  So now I have:
>>
>> //=============================================
>> <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
>> <!DOCTYPE boost_serialization>
>> <boost_serialization signature="serialization::archive" version="5">
>> <test class_id="0" tracking_level="0" version="0">
>> <context class_id="1" tracking_level="0" version="0">
>> <length>1</length>
>> <buffer-A>1732584193</buffer-A>
>> <buffer-B>4023233417</buffer-B>
>> <buffer-C>2562383102</buffer-C>
>> <buffer-D>271733878</buffer-D>
>> <message-tail>g</message-tail>
>> </context>
>> </test>
>> </boost_serialization>
>> //=============================================
[SNIP my further test cases]
>> If you want to see the actual work, look at revision/change-set
>> #48131 in Boost's Subversion set-up.  Now to the actual questions:
[SNIP question 1]
>> 2.  Before actually trying to serialize a string, I was worried that
>> the string's serialization would include a length count.  This would
>> be unnecessary because the object's "length" attribute already
>> implies the length of the string (int( ceil( double( length % 512 ) /
>> 6.0 ) )).  Here, we see that the string's length isn't explicitly
>> included in the XML archive, so I have no worries.  But what about
>> non-XML archives?  Will be string's length be directly serialized,
>> wasting space?  If so, how can I fix that?
>
> in xml no length
> in text - includes a length count - but arrays don't
> in binary - no length count
What does the second one mean?  That serializing a std::string will  
stick in a length but something like char[34] won't?  If so, does  
that mean serializing a char[34] will always use 34 entries, even if  
the stored string is shorter?
>> 3.  Having to add std::string to support serialization makes my class
>> header heavier. My class uses fixed-sized arrays, so is there any way
>> that I can avoid allocating a string?
>
> just use an array of characters - that is a fixed lenth no overhead.
>
>> For writing out, could I set
>> up a char-array with the encoding and write that out?
>
> in the library, encoding - utf-8, locale, etc is determined by the
> stream attached to the archive.
I meant "encoding" in the general sense (not utf-8 vs. windows-1252  
vs. etc.).  The real Boolean array has less than 512 elements, which  
translates to between 0 and 86 sextets for a base64 string.  If I use  
a char[87] as scratch space to write the base64 encoding, will  
serializing it use 87 (or 86) entries or can I adjust it?  Note that  
a different field that I serialize can give me the length that I  
actually need for the base64 string.
>> For reading
>> in, can I read the string in piecemeal to a char-array just in case
>> someone added more characters than required.
>
> There is no enforced requirement that the saving and loading have
> to be the "same".  I would recommend always not dividing serialize
> in to save/load unless its necessary.  If it is, then I strive to make
> them symetrical so that their correctness can be easily verified.
Well, I'm using a base64 encoding, so I have to use the separate but  
symmetrical technique.  Maybe going to an unsigned-char array and  
base_object will make this obsolete.
>> My converter currently
>> ignores illegal characters and stops when enough legal characters
>> have been read.  If what I ask is possible, would the reading routine
>> have to seek to the end of the entry so further serialization isn't
>> messed up?
>
> For this you would have to add your own special sauce to the
> xml archive class.  This you could do by derivation.  It's  
> straightforward
> but ends up being somewhat tricky in practice.
I don't think I explained this right.  I'm talking about code to  
sanitize the received string within my serialization routines no  
matter the source, NOT a per-archive setting.  In my new notes about  
writing, I mentioned about using a char[87] to serialize out the  
data.  When I serialize in, will your code make sure I get no more  
than 87 (or 86) entries, in case some blooper makes the string too  
long?  (I mentioned piecemeal in the old notes because that's the  
only way to read over-long strings without dynamic allocation [on my  
end].)  Actually, when I perused through the B.s11n code once, I  
think I saw something that will throw/crash if an array serialization  
is longer than the array's actual type.
-- Daryle Walker Mac, Internet, and Video Game Junkie darylew AT hotmail DOT com