$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [gsoc-2013] proposal for approximate string matching
From: Erik Erlandson (eerlands_at_[hidden])
Date: 2013-04-27 15:16:11
> > I agree to modify the return type for function (4). So now I think your
> suggestion for using a class to supply
> these 3 functions together is a better choice. In case (1)(2)(3), I think
> return size_type seems to be better since
> the algorithm itself can only output an integer as the result?
I think in most cases the result would be integer. I did once implement a variation where I used (1.5) for the cost of substitution, and (1.0) for insertion/deletion. I no longer remember the details, but I do recall that using (2.0) didn't work as well, it wanted to be (1.5). At any rate, the resulting distance in such a case would be floating point, not integer. If it's reasonably easy to support configurable compute/return types, including non-integers, I think it might prove useful.
I know that you are going to be focusing on actual strings for your applications, but there are other kinds of edit distance application, for example a use case like the unix diff command, where you are doing edit-distance-like computations on two sequences of lines of text, and in fact the edit cost between any two lines may itself be computed as an edit distance. So I think designing the routines to be as general as possible (within reason) will pay dividends for the boost community, and improve their adoption.