$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] NuDB: A fast key/value insert-only database for SSD drives in C++11
From: Lee Clagett (forum_at_[hidden])
Date: 2017-03-29 00:26:42
On Tue, 28 Mar 2017 08:59:10 -0400
Vinnie Falco via Boost <boost_at_[hidden]> wrote:
> On Tue, Mar 28, 2017 at 8:45 AM, Lee Clagett via Boost
> <boost_at_[hidden]> wrote:
> > ...writing the log header after its contents
> > could reduce the probability of an undetected incomplete write
>
> The recovery test simulates partial writes:
> https://github.com/vinniefalco/NuDB/blob/master/extras/nudb/test/fail_file.hpp#L292
This is simulating a write I/O error, not a power failure. Even with
the assumption that a returned `fsync` has fully stored the data on
disk, the recovery algorithm could be opening a log file which called
`write` but never returned from `fsync`. That file could have enough
"space" for a bucket, but lack the proper contents of the bucket
itself. There is an inherent race between writing and the completion of
an `fsync` that will go unnoticed by the current recovery algorithm on
some filesystem configurations. The only "portable" fixes I've seen
are: (1) cryptographic hashes, (2) hoping that changing path to inode
mappings is all-or-nothing, or (3) hoping that _overwriting_ a single
sector last will be all-or-nothing. Both (2) and (3) still depend on
the filesystem + hardware AFAIK, BUT probably work with more filesystem
and hardware configurations.
> > NuDB already has a file concept that needs documenting and
> > formalizing before any potential boost review.
>
> http://vinniefalco.github.io/nudb/nudb/types/File.html
>
I did miss this. Defining the concept in terms of "records" might be
more useful for storing a cryptographic hash in the manner Niall was
mentioning for SQLite. I think it could allow per record corruption
detection, so that the entire DB wasn't punted after an incomplete
write.
Lee