$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [GSoC] Some new ideas for this year's GSoC
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2010-04-06 16:10:10
Stefan wrote:
> * Optimize Boost String Algorithm finders to use an efficient substring
> matching algorithm as opposed to the naive algorithm. Most likely
> because of the fact that boost has implemented the finders for strings
> as generic algorithms, they haven't used efficient algorithms such as
> KMP, Boyer-Moore, Rabin-Karp etc.
> \see boost/algorithm/string/detail/finder.hpp
> The idea is to implement efficient algorithms such as the ones
> mentioned above to replace the current naive implementation. Note: it
> may be difficult to find nth matching substring position using some of
> these algorithms.
> Difficulty: easy-medium
>
> * Implement a string algorithm for generating the suffix array. The
> suffix array, after being precomputed on a certain string, allows for
> very efficient substring searches in the future. This is useful when the
> string doesn't change too often but there are plenty of substring to
> search against.
Hi Stefan,
Have you seen this thread, from a couple of years ago? :
http://thread.gmane.org/gmane.comp.lib.boost.devel/171098
I don't know what has changed in string_algo since then, but I think
there is probably plenty of opportunity for easy optimisation.
Do you have a motivating application for this? A lot of this stuff is
rather data-dependant, so having a particular application that can
provide a benchmark is a good idea. I guess that a suffix array
library would be reasonable for a GSoC project, if you're good! You
might also like to consider variants that index at e.g. word
boundaries; I have used something like that and it works well for some
kinds of search.
Regards, Phil.