$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Angus Leeming (angus.leeming_at_[hidden])
Date: 2004-10-01 05:27:55
Martin wrote:
>> An interesting idea and certainly much less work
>>
>> However, as I understand it, you're suggesting limiting the wildcards
>> simply to ensure that the filtered_directory_iterator behaves the same
>> on posix and windows systems?
>
> No. The main reason was to have a simple iterator for simple (and what I
> think is the most common) cases which also avoid the need to go via a
> list.
That's two separate requirements.
1. A simple iterator. By 'simple', you mean one using the underlying API.
Right?
2. Avoid the list in
list<path> glob(string const & pattern, path const & working_dir);
Actually, this second requirement is contradictory to the first because
glob()'s results must be stored internally for the iterator to then
iterate over. No?
> So did I but I put it into a separate iterator where you can define the
> rules completely independent of the filesystem.
This is, in essence, what I am proposing. I have now reworked the interface
following Gennadiy's suggestion. Here's a glob_iterator that can recurse
down directories:
class BOOST_GLOB_DECL glob_iterator
: public iterator_facade<
glob_iterator // Derived type
, filesystem::path const // value_type
, single_pass_traversal_tag
>
{
public:
glob_iterator() {}
glob_iterator(std::string const & pattern,
filesystem::path const & wd,
glob_flags flags);
private:
...
};
It works, but is considerably slower than the function returning a list. No
doubt profiling will help track down what I'm doing inefficiiently.
# A simple wrapper for the real glob()
$ time ./real_glob_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
934
real 0m0.042s
user 0m0.010s
sys 0m0.010s
# The glob() function I posted earlier in the week.
$ time ./glob_fun_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
934
real 0m0.099s
user 0m0.070s
sys 0m0.010s
# The new glob_iterator.
$ time ./glib_it_rls '*/*/*.hpp' '/home/angus/boost/cvs/' | wc -l
934
real 0m0.236s
user 0m0.200s
sys 0m0.010s
I'm never sure whether to pay attention to the 'real' or to the 'user'
times... Anyway, there's a clear heirarchy ATM.
>> Don't you ever search for things like "[a-d]*.{cxx,hpp}"?
>
> I do it in the shell but I have never had the need to do it inside an
> application. I'm sure there are such applications.
Here's one. Qt (QProcess), gtk (gspawn*) and ACE (ACE_Process) all enable
the user to spawn a child process in a portable way. However, what they
all lack is a *powerful* way to initialise their data from a string
containing a "command-line like" syntax.
(And, no, passing an arbitrary "ls `rm -f *` foo.cpp" to the system()
command isn't a viable alternative.)
I've been playing around writing something that can parse a subset of the
Bourne shell. Enough to make it easy and safe to launch a single process
from a string. "parse_pseudo_command_line" fills a "spawn_data" variable.
It's then simple to ascertain whether the request is safe or not.
Now *this* is a function that would benefit from a portable glob.
http://www.devel.lyx.org/~leeming/libs/child/doc/html/parse_pseudo_command_line.html
Equivalent URL: http://tinyurl.com/4c4v9
>> Also, how do you limit the wildcards? I take it you don't, but that the
>> underlying matcher (findfirstfile, glob) will behave differently on
>> receipt of the same pattern.
>
> The filesystems already behave differently since one is case-sensitive
> and the other is not. Anyway, I think it is reasonable to limit the
> wildcards to some portable syntax e.g. max 2 '*' are allowed and they
> must either be the last character or followed by a '.'.
Again, how do you *limit* them? That implies that you must prescan the
pattern, presumably throwing once you've determined that the thing is
breaking your "reasonable" limits.
Regards,
Angus