$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: [boost] [gsoc-2013]Approximate string matching
From: Yu Jiang (sunlightt07_at_[hidden])
Date: 2013-04-23 04:26:35
Hello, I'm Yu Jiang, currently a master student from Tsinghua University in
China. I'm attracted by the idea of "approximate string matching" since
it's highly related about my research and also my interest.
I have seen in a previous discussion, another student proposed to implement
some basic matching functions such as "edit distance" for boost. He said by
using dynamic programming, the time complexity is O(m*n). However, if we
give a threshold T and only report the exact edit distance if it's no
larger than T, the complexity can be O(T * min(m,n)). I think sometimes we
just need to decide whether the edit distance is small enough.
Furthermore, I'm also interested in approximate string search. That is,
given a query string word Q and a candidate word set S, find all the
strings in S who have the edit distance no larger than a threshold T with
the query string. This is very useful in many area such as data
integration, and I have strong knowledge in fast algorithms to solve this.
So, is it compatible for gsoc 2013 and for boost? Looking forward to your
opinions.
Thanks,
Yu