$include_dir="/home/hyper-archives/boost-users/include"; include("$include_dir/msg-header.inc") ?>
Subject: [Boost-users] [Serialization] Segfault while serializing derived pointers using multi DLLs
From: François Mauger (mauger_at_[hidden])
Date: 2011-03-17 09:29:10
Hi all (and particularly Boost/(De-)Serializer),
I use Boost 1.44, gcc 4.4.1, Linux.
The problem:
I have two home made libraries compiled as DLL under linux:
- 'datatools'  provides the 'libdatatools.so' DLL
- 'brio'          provides the 'libbrio.so' DLL
I use Boost serialization  features for derived pointers.
Below are the details:
STEP 1:
'datatools' is the base library. It defines:
- its own namespace: 'datatools'
- a virtual class (interface) named 'i_serializable'
  from which all other serializable classes should inherit
  in order to benefit of the (de)serialization mechanism through
  pointer to this base class.
  [see 
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/serialization.html#derivedpointers]
- some concrete classes (A, B, C) that inherit from the 'i_serializable'
  interface and register themselves using
  the export key features described in
  
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/special.html#export
Typical inheritance diagram looks like:
<pre>
datatools::i_serializable
|
+--------------+--------------+
|              |              |             
datatools::A   datatools::B   datatools::C
|
datatools::A'  
</pre>
Here is the typical model of the 'A.hpp' header file for the A class:
<pre>
...
#include <datatools/serialization/archives_list.hpp>  // include 
Boost/Serialization text/XML/binary archives
#include <datatools/serialization/i_serializable.hpp> // include the 
abstract mother interface class
...
namespace datatools {
  class A: public i_serializable
  {
    blah-blah..
    // no inline code (from 
http://www.boost.org/doc/libs/1_46_1/libs/serialization/doc/special.html#dlls)
    template<class Archive>
    void serialize (Archive & ar,      
                    const unsigned int version);
  };
}
// register the class with a specific GUID:
BOOST_CLASS_EXPORT_KEY2 (datatools::A, "datatools::A");
</pre>
Here is the model of the 'A.cpp' implementation file:
<pre>
namespace datatools {
...
template<class Archive>
void A::serialize (Archive & ar,      
                   const unsigned int version)
{
  ar & boost::serialization::make_nvp(                           
         
"datatools__serialization__i_serializable",                                         
         
boost::serialization::base_object<datatools::serialization::i_serializable 
 >(*this)
       );
  ar & more data (with NVP stuff)...;
}
} // end of namespace datatools
BOOST_CLASS_EXPORT_IMPLEMENT(datatools::A)
// explicit instantiation for all kind of known archives:
#include <datatools/serialization/archives_list.hpp> // include the 
known text/XML/binary archives
template void datatools::A::serialize(boost::archive::text_oarchive & 
ar, const unsigned int version);
template void datatools::A::serialize(boost::archive::text_iarchive & 
ar, const unsigned int version);
template void datatools::A::serialize(boost::archive::xml_oarchive & ar, 
const unsigned int version);
... more...
</pre>
Finaly,  I  can  compile  all  this  stuff using  gcc  and  build  the
'libdatatools.so'     library     which     I    prepend     to     my
LD_LIBRARY_PATH. Everything looks fine.
A test  program 'prg1.cpp'  that links against  only 'libdatatools.so'
and  'libboost_serialization.so'  works  prefectly,  serialiazing  and
deserialiazing any collection of pointers to A,B, or C classes without
problem. A must ! Thanks to Robert for that magic !
At this point, everything looks (is?) fine.  Note that I have followed
(in principle) all the guidelines provided by Robert.
STEP 2:
Now  let's  consider  the  actual   problem  !   As  said  before,  my
'datatools'  library is  the base  of some  modular project  with some
other libraries that depend on 'datatools' (and Boost/Serialization).
The 'brio' library is such a  beast:
<pre>
Boost/Serialization
|
datatools
|
brio
</pre>
It has its own namespace: 'brio'
It  provides  a  few  other  dedicated  classes,  inherited  from  the
'datatools::i_serializable' abstract class  and which are serializable
via Boost.
Let's consider the serializable 'brio::D' class, designed on the model
of 'datatools::A'  and using the  same implementation recommendations.
I have  followed the  guidelines use for  the 'datatools::A'  class to
write both 'D.hpp' and 'D.cpp' files.
Now the inheritance scheme is:
<pre>
datatools::i_serializable                  :
|                                          :   
+--------------+--------------+------------+-----+- - - -
|              |              |            :     |
datatools::A   datatools::B   datatools::C :     brio::D
|                                          : 
datatools::A'                              : 
                libdatatools.so scope      :  libbrio.so scope
                                           : 
</pre>
I can compile the 'libbrio.so' DLL without any problem.
Now  I want  to run  a sample  program 'prg2.cpp'  that  performs some
(de)serialization   operations  on   a  collection   of   pointers  to
'datatools::A',    'datatools::B',   'datatools::C'    AND   'brio::D'
instances.   This program  is linked  against the  following libraries
(among others):
- libbrio.so    
- libdatatools.so    
- libboost_serialization.so
Well,  it  compiles  perfectly.    Note  this  program  links  against
third-party  libraries  too, among  them  some  are explicitely  using
'dlopen' and 'dlclose' to  satisfy internal and critical features that
are out of my scope. I have no idea if this can have side-effect.
However, when I run it, I observed the following behaviour:
- all (de)serialization operations are done properly
  and I get files with embeded (text/XML...) portable archives than can 
be reloaded
  without problem.
- at the END of the program, while some cleaning code is invoked (some
  kind of deep buried code out of my skills and understanding), I get
  a Segmentation fault.
Here is a dump of the GDB backtrace:
<pre>
Program received signal SIGSEGV, Segmentation fault.
0x02647c78 in 
boost::serialization::typeid_system::extended_type_info_typeid_0::is_less_than(boost::serialization::extended_type_info 
const&) const () from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
(gdb) bt
#0  0x02647c78 in 
boost::serialization::typeid_system::extended_type_info_typeid_0::is_less_than(boost::serialization::extended_type_info 
const&) const ()
   from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
#1  0x0264737b in 
boost::serialization::extended_type_info::operator<(boost::serialization::extended_type_info 
const&) const () from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
#2  0x0264dcac in 
boost::serialization::void_cast_detail::void_caster::operator<(boost::serialization::void_cast_detail::void_caster 
const&) const () from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
#3  0x0264e56d in 
boost::serialization::void_cast_detail::void_caster::recursive_unregister() 
const ()
   from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
#4  0x0264ed8d in 
boost::serialization::void_cast_detail::void_caster_shortcut::~void_caster_shortcut() 
()
   from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
#5  0x0264e5ee in 
boost::serialization::void_cast_detail::void_caster::recursive_unregister() 
const ()
   from 
/scratch/sw/boost/install-1_44_0-Linux-i686-gcc44/lib/libboost_serialization.so.1.44.0
#6  0x024a6da7 in 
boost::serialization::void_cast_detail::void_caster_primitive<datatools::test::more_data_t, 
datatools::test::data_t>::~void_caster_primitive() ()
   from 
/home/mauger/Private/Work/lpc_nemo_svn/sw/datatools/datatools_trunk/Linux-i686/lib/libdatatools.so
#7  0x024a6f04 in 
boost::serialization::detail::singleton_wrapper<boost::serialization::void_cast_detail::void_caster_primitive<datatools::test::more_data_t, 
datatools::test::data_t> >::~singleton_wrapper() ()
   from 
/home/mauger/Private/Work/lpc_nemo_svn/sw/datatools/datatools_trunk/Linux-i686/lib/libdatatools.so
#8  0x0298c428 in __cxa_finalize (d=0x2609830) at cxa_finalize.c:56
#9  0x02426f04 in __do_global_dtors_aux ()
   from 
/home/mauger/Private/Work/lpc_nemo_svn/sw/datatools/datatools_trunk/Linux-i686/lib/libdatatools.so
#10 0x02534100 in _fini ()
   from 
/home/mauger/Private/Work/lpc_nemo_svn/sw/datatools/datatools_trunk/Linux-i686/lib/libdatatools.so
#11 0x0011dee6 in _dl_fini () at dl-fini.c:248
#12 0x0298c05f in __run_exit_handlers (status=0, listp=0x2a9e304, 
run_list_atexit=true) at exit.c:78
#13 0x0298c0cf in *__GI_exit (status=0) at exit.c:100
#14 0x02973b5e in __libc_start_main (main=0x8059495 <main>, argc=1, 
ubp_av=0xbfffcfd4, init=0x8061c40 <__libc_csu_init>,
    fini=0x8061c30 <__libc_csu_fini>, rtld_fini=0x11dcc0 <_dl_fini>, 
stack_end=0xbfffcfcc) at libc-start.c:252
#15 0x080592c1 in _start () at ../sysdeps/i386/elf/start.S:119
</pre>
If one  ignores the  nasty details from  this stack (local  pathes and
names), one observe that the problem seems to  be related to some
unregistration of some  Boost/Serialization material.  It occurs while
the executable  is trying to destruct  some singleton_wrapper template
class  that manages  some  serializable classes  from the  'datatools'
library:
- class 'datatools::test::more_data_t' (call it A')
- and its mother class 'datatools::test::data_t', (call it A) inherited 
from 'datatools::i_serializable'.
I expect such singleton is a static instance attached in some DLL.  Am
I  wrong  ?   If   not,  which  DLL  is  concerned  'libdatatools.so',
'libbrio.so'  ?   My  feeling is  that  I  have  a problem  with  some
arbitrary  order of  library unloading  and messy  unregistration that
comes with.  Unless there is a specific order to aggregate module
within in DLL (A.o B.o A'.o...). Unfortunately, my skills are too 
limited to make a better
idea and find a solution.  There is some comments by Robert concerning
such possible problems, but I'm not sure it makes sense in my case.
So I  will really appreciate if  someone could advise  me and possibly
give me some hints.
Thanks a lot for attention and help.
Apologize for this rather long and technical issue.
Regards
frc
-- François Mauger Groupe "Interactions Fondamentales et Nature du Neutrino" NEMO-3/SuperNEMO Collaboration LPC Caen-CNRS/IN2P3-UCBN-ENSICAEN Département de Physique -- Université de Caen Basse-Normandie Adresse/address: Laboratoire de Physique Corpusculaire de Caen (UMR 6534) ENSICAEN 6, Boulevard du Marechal Juin 14050 CAEN Cedex FRANCE Courriel/e-mail: mauger_at_[hidden] Tél./phone: 02 31 45 25 12 / (+33) 2 31 45 25 12 Fax: 02 31 45 25 49 / (+33) 2 31 45 25 49