Subject: [boost] RFC: edit_distance / edit_alignment library
From: Erik Erlandson (eje_at_[hidden])
Date: 2013-06-25 22:38:27


Boost Community,

I cooked up a library that provides a pair of functions: edit_distance() and edit_alignment(). The library currently lives here:
https://github.com/erikerlandson/edit_distance

Some short example programs are here:
https://github.com/erikerlandson/edit_distance/tree/master/examples

A few example expressions:

// the edit distance (aka levenshtein distance)
unsigned dist = edit_distance("abc", "bcd");

// using a custom cost function for insertion, deletion, substitution
unsigned dist = edit_distance("abc", "bcd", custom_cost_function());

// accepts arbitrary sequences, ranges, range adaptors
unsigned dist = edit_distance(as_vector("abc"), as_list("bcd") | boost::adaptors::reversed);

// obtain the edit distance and the list of the edit operation codes (in output iterator)
unsigned dist = edit_alignment("abc", "bcd", std::ostream_iterator<edit_opcode>(std::cout, "")).second;

// adaptors to add edit operation costs and sequence elements to the edit operation output
unsigned dist = acquire<elements>(acquire<costs>(edit_alignment))("abc", "bcd", std::ostream_iterator<boost::tuple<edit_opcode, unsigned, char, char> >(std::cout, "")).second;

I'm interested in community feedback on whether these would be a useful addition to boost::algorithms. I envision these as the first additions to an eventually-larger library of other kinds of string distances, dynamic programming, and other algorithms for sequence alignment.