$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: [boost] RFC: edit_distance / edit_alignment library
From: Erik Erlandson (eje_at_[hidden])
Date: 2013-06-25 22:38:27
Boost Community,
I cooked up a library that provides a pair of functions: edit_distance() and edit_alignment(). The library currently lives here:
https://github.com/erikerlandson/edit_distance
Some short example programs are here:
https://github.com/erikerlandson/edit_distance/tree/master/examples
A few example expressions:
// the edit distance (aka levenshtein distance)
unsigned dist = edit_distance("abc", "bcd");
// using a custom cost function for insertion, deletion, substitution
unsigned dist = edit_distance("abc", "bcd", custom_cost_function());
// accepts arbitrary sequences, ranges, range adaptors
unsigned dist = edit_distance(as_vector("abc"), as_list("bcd") | boost::adaptors::reversed);
// obtain the edit distance and the list of the edit operation codes (in output iterator)
unsigned dist = edit_alignment("abc", "bcd", std::ostream_iterator<edit_opcode>(std::cout, "")).second;
// adaptors to add edit operation costs and sequence elements to the edit operation output
unsigned dist = acquire<elements>(acquire<costs>(edit_alignment))("abc", "bcd", std::ostream_iterator<boost::tuple<edit_opcode, unsigned, char, char> >(std::cout, "")).second;
I'm interested in community feedback on whether these would be a useful addition to boost::algorithms. I envision these as the first additions to an eventually-larger library of other kinds of string distances, dynamic programming, and other algorithms for sequence alignment.