From: Angus Leeming (angus.leeming_at_[hidden])
Date: 2004-10-01 05:27:55


Martin wrote:

>> An interesting idea and certainly much less work
>>
>> However, as I understand it, you're suggesting limiting the wildcards
>> simply to ensure that the filtered_directory_iterator behaves the same
>> on posix and windows systems?
>
> No. The main reason was to have a simple iterator for simple (and what I
> think is the most common) cases which also avoid the need to go via a
> list.

That's two separate requirements.
1. A simple iterator. By 'simple', you mean one using the underlying API.
   Right?

2. Avoid the list in

list<path> glob(string const & pattern, path const & working_dir);

Actually, this second requirement is contradictory to the first because
glob()'s results must be stored internally for the iterator to then
iterate over. No?

> So did I but I put it into a separate iterator where you can define the
> rules completely independent of the filesystem.

This is, in essence, what I am proposing. I have now reworked the interface
following Gennadiy's suggestion. Here's a glob_iterator that can recurse
down directories:

class BOOST_GLOB_DECL glob_iterator
    : public iterator_facade<
                 glob_iterator // Derived type
               , filesystem::path const // value_type
               , single_pass_traversal_tag
>
{
public:
    glob_iterator() {}
    glob_iterator(std::string const & pattern,
                  filesystem::path const & wd,
                  glob_flags flags);
private:
    ...
};

It works, but is considerably slower than the function returning a list. No
doubt profiling will help track down what I'm doing inefficiiently.

# A simple wrapper for the real glob()
$ time ./real_glob_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
    934
real 0m0.042s
user 0m0.010s
sys 0m0.010s

# The glob() function I posted earlier in the week.
$ time ./glob_fun_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
    934
real 0m0.099s
user 0m0.070s
sys 0m0.010s

# The new glob_iterator.
$ time ./glib_it_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
    934
real 0m0.236s
user 0m0.200s
sys 0m0.010s

I'm never sure whether to pay attention to the 'real' or to the 'user'
times... Anyway, there's a clear heirarchy ATM.

>> Don't you ever search for things like "[a-d]*.{cxx,hpp}"?
>
> I do it in the shell but I have never had the need to do it inside an
> application. I'm sure there are such applications.

Here's one. Qt (QProcess), gtk (gspawn*) and ACE (ACE_Process) all enable
the user to spawn a child process in a portable way. However, what they
all lack is a *powerful* way to initialise their data from a string
containing a "command-line like" syntax.

(And, no, passing an arbitrary "ls `rm -f *` foo.cpp" to the system()
command isn't a viable alternative.)

I've been playing around writing something that can parse a subset of the
Bourne shell. Enough to make it easy and safe to launch a single process
from a string. "parse_pseudo_command_line" fills a "spawn_data" variable.
It's then simple to ascertain whether the request is safe or not.

Now *this* is a function that would benefit from a portable glob.

http://www.devel.lyx.org/~leeming/libs/child/doc/html/parse_pseudo_command_line.html
Equivalent URL: http://tinyurl.com/4c4v9

>> Also, how do you limit the wildcards? I take it you don't, but that the
>> underlying matcher (findfirstfile, glob) will behave differently on
>> receipt of the same pattern.
>
> The filesystems already behave differently since one is case-sensitive
> and the other is not. Anyway, I think it is reasonable to limit the
> wildcards to some portable syntax e.g. max 2 '*' are allowed and they
> must either be the last character or followed by a '.'.

Again, how do you *limit* them? That implies that you must prescan the
pattern, presumably throwing once you've determined that the thing is
breaking your "reasonable" limits.

Regards,
Angus