$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
Subject: Re: [boost] [GSoC] Some new ideas for this year's GSoC
From: Phil Endecott (spam_from_boost_dev_at_[hidden])
Date: 2010-04-06 16:10:10
Stefan wrote:
> * Optimize Boost String Algorithm finders to use an efficient substring 
> matching algorithm as opposed to the naive algorithm. Most likely 
> because of the fact that boost has implemented the finders for strings 
> as generic algorithms, they haven't used efficient algorithms such as 
> KMP, Boyer-Moore, Rabin-Karp etc.
>   \see boost/algorithm/string/detail/finder.hpp
>   The idea is to implement efficient algorithms such as the ones 
> mentioned above to replace the current naive implementation. Note: it 
> may be difficult to find nth matching substring position using some of 
> these algorithms.
>   Difficulty: easy-medium
>
> * Implement a string algorithm for generating the suffix array. The 
> suffix array, after being precomputed on a certain string, allows for 
> very efficient substring searches in the future. This is useful when the 
> string doesn't change too often but there are plenty of substring to 
> search against.
Hi Stefan,
Have you seen this thread, from a couple of years ago? :
     http://thread.gmane.org/gmane.comp.lib.boost.devel/171098
I don't know what has changed in string_algo since then, but I think 
there is probably plenty of opportunity for easy optimisation.
Do you have a motivating application for this?  A lot of this stuff is 
rather data-dependant, so having a particular application that can 
provide a benchmark is a good idea.  I guess that a suffix array 
library would be reasonable for a GSoC project, if you're good!  You 
might also like to consider variants that index at e.g. word 
boundaries; I have used something like that and it works well for some 
kinds of search.
Regards,  Phil.