Home > Blockchain >  Make EnumerateDirectory stop looking for subfolders if condition is met
Make EnumerateDirectory stop looking for subfolders if condition is met

Time:12-07

I'm trying to find some directories on a network drive.

I use Directory.EnumerateDirectories for this. The problem is that it takes very long because there are many subdirectories.

Is there a way to make the function stop searching further down into subdirectories if a match was found and carry on with the next directory on same level?

static readonly Regex RegexValidDir = new ("[0-9]{4,}\\.[0-9] $");
var dirs = Directory.EnumerateDirectories(startDir, "*.*", SearchOption.AllDirectories)
                .Where(x => RegexValidDir.IsMatch(x));

The directory structure looks like that

a\b\20220902.1\c\d\
a\b\20220902.2\c\d\e
a\b\x\20220902.3\
a\b\x\20221004.1\c\
a\b\x\20221004.2\c\
a\b\x\20221004.3\d\e\f\
...
a\v\w\x\20221104.1\c\d
a\v\w\x\20221105.1\c\d
a\v\w\x\20221106.1\c\d
a\v\w\x\20221106.2\c\d
a\v\w\x\20221106.3\c\d
a\v\w\x\20221106.4\

I'm interested in the directories with a date in the name only and want to stop searchin further down into the subdirectories of a matching dir.

Another thing is I don't know if the search pattern I'm supplying (.) is correct for my usage szenario.

The directories are found relatively quickly, but it then takes another 11 minutes to complete the search function

CodePudding user response:

I don't think that it's possible to prune the enumeration efficiently with the built-in Directory.EnumerateDirectories method, in SearchOption.AllDirectories mode. My suggestion is to write a custom recursive iterator, that allows to select the children of each individual item:

static IEnumerable<T> Traverse<T>(IEnumerable<T> source,
    Func<T, IEnumerable<T>> childrenSelector)
{
    foreach (T item in source)
    {
        IEnumerable<T> children = childrenSelector(item);
        yield return item;
        if (children is null) continue;

        foreach (T child in Traverse(children, childrenSelector))
            yield return child;
    }
}

Then for the directories that match the date pattern, you can just return null children, effectively stopping the recursion for those directories:

IEnumerable<string> query = Traverse(new[] { startDir }, path =>
{
    if (RegexValidDir.IsMatch(path)) return null; // Stop recursion
    return Directory.EnumerateDirectories(path);
}).Where(path => RegexValidDir.IsMatch(path));

This query is slightly inefficient because the RegexValidDir pattern is matched twice for each path (one in the childrenSelector and one in the predicate of the Where). In case you want to optimize it, you could modify the Traverse method by replacing the childrenSelector with a more complex lambda, that returns both the children and whether the item should be yielded by the iterator: Func<T, (IEnumerable<T>, bool)> lambda. Or alternatively use the Traverse as is, with the T being (string, bool) instead of string.

  • Related