From: Angus Leeming (angus.leeming_at_[hidden])
Date: 2004-09-27 15:33:55


Gennadiy Rozental wrote:
>> Would you be happy if I changed the interface to:
>>
>> std::list<fs::path> matches;
>> boost::glob(matches, "../te?t/foo*bar", starting_directory);
>
> It slightly better, but I still prefer iterator interface. What I am
> going to do with above list? Most probably I a mgoing to iterate through
> it to do somethins, or find simething. What if I only need first matching
> element? We have several years expirience now with container/iterator
> alternatives. I believe iterator interface is more natual here (and
> ultimately more efficient).

Don't get me wrong, I'm not disagreeing. It's just that I'm on a steep
learning curve here.

Maybe it helps us both if I present some code. The first thing that
boost::glob does is parse the "../foo*bar/*.cpp" pattern into a list of
predicates, one per component in the pattern tree (3 in this case):

        std::list<detail::glob_predicate<> > predicates;

        bool contains_wildcards = false;
        for (fs::path::iterator it = glob_path.begin(),
             end = glob_path.end();
             it != end; ++it) {
            detail::glob_predicate<> const predicate(*it, flags);
            contains_wildcards += predicate.contains_wildcards();
            predicates.push_back(predicate);
        }

The glob_predicate is a functor that is used by glob_iterator to filter
files:

template <typename TraitsT = glob_traits>
struct glob_predicate {
    /** @returns true if @c p.leaf() matches the glob expression. */
    bool operator()(filesystem::path const & p) const
};

However, if the pattern I'm trying to match doesn't contain any wild cards
at all, then there's no need for any iteration at all:

        std::list<fs::path> matches;
        if (contains_wildcards) {
            glob_impl(matches, predicates.begin(), prior(predicates.end()),
                      working_dir, flags);

        } else {
            fs::path abs_path = working_dir / glob_path;
            if (exists(abs_path))
                matches.push_back(abs_path);
        }

That doesn't seem to fit nicely with an iterator interface to me, but I
emphasise I'm coming from a position of zero knowledge.

Similarly, glob_impl is a recursive function that calls itself for each
component in the "predicates" tree. At each level, it checks whether that
component contains a wild card. If not, it just adds the component to the
matching path:

void glob_impl(std::list<fs::path> & matches,
               predicate_vec::const_iterator first,
               predicate_vec::const_iterator last,
               fs::path const & working_dir,
               glob_flags flags)
{
    detail::glob_predicate<> const & predicate = *first;

    if (first == last) {
        // We've reached the destination directory.
        // Add any matches to @c matches.
        if (!predicate.contains_wildcards()) {
            fs::path const candidate =
                working_dir / fs::path(predicate.file(), fs::no_check);

            if (exists(candidate) &&
                (!(flags & glob_onlydir) || is_directory(candidate)))
                    matches.push_back(candidate);

        } else {
            detail::glob_iterator git(predicate, working_dir);
            detail::glob_iterator const gend;
            for (; git != gend; ++git) {
                if (!(flags & glob_onlydir) || is_directory(*git))
                    matches.push_back(*git);
            }
        }

        return;
    }

    if (!predicate.contains_wildcards()) {
        // No testing for the existence of the directory.
        // That would place additional, unnecessary requirements on glob().
        glob_impl(matches, next(first), last,
                  working_dir / fs::path(predicate.file(), fs::no_check),
                  flags);

    } else {
        detail::glob_iterator git(predicate, working_dir);
        detail::glob_iterator const gend;
        for (; git != gend; ++git) {
            if (is_directory(*git))
                glob_impl(matches, next(first), last, *git, flags);
        }
    }
}
        

Now, presumably you're going to come back at me and tell me that this is
all just implementation detail. That's true, it is. But my problem is that
I don't yet see how I could attack the problem more efficiently using some
other approach.

Does any of this make sense?

Regards,
Angus