Subject: Re: [boost] RFC: edit_distance / edit_alignment library
From: Erik Erlandson (eje_at_[hidden])
Date: 2013-06-30 20:43:09


> I have had a quick look. It seems that you've implemented the
> Needleman-Wunsch
> algorithm, which takes quadratic time and space. There are less
> complex algorithms,
> at least for similar input sequences, e.g. Myers - "An O(ND) Difference
> Algorithm
> and Its Variations"; I have an implementation of that here:
> http://svn.chezphil.org/anyterm/trunk/src/diff.cc & .hh.
>
> What is your motivation for choosing Needleman-Wunsch?

It was the simplest algorithm to get going. I was interested in developing a nice boost-ish programmatic interface, and developing unit tests, so for the implementation I stuck with the easy algorithm that I was familiar with. Now I (or anybody else) can drop in better algorithms with some unit test assurance.

Plus, Needleman-Wunsch is pretty effective for applications involving relatively small strings or sequences. Not so much for applications like large file diffs, or DNA sequences, etc.

At any rate, I view the particular algorithm as somewhat fungible, provided that the interface design is appealing to people. If there is disagreement on that point, I'm interested to know that too.