Home > Mobile >  How to iterate through a directory_iterator in parallel?
How to iterate through a directory_iterator in parallel?

Time:12-15

std::filesystem::directory_iterator is a LegacyInputIterator and apparently it can't be used in a parallel std::for_each

I can iterate through the directory_iterator, get the items, place them in a vector and use that vector for parallel iteration.

Can the above step be omitted? Is there a way to iterate through a directory_iterator in parallel like this:

std::for_each(
    std::execution::par_unseq, // This is ignored currently
    std::filesystem::begin(dir_it),
    std::filesystem::end(dir_it),
    func
);

CodePudding user response:

directory_iterator is an input iterator, which means it generates values during the traversal. Furthermore, multiple traversals over the same directory may produce different sequences of values (both in terms of order and values themselves), which means the traversal is not restartable.

For parallel algorithms this means that the sequence cannot be partitioned, the iteration must happen sequentially, in one thread. The only opportunity to parallelize the processing is to offload func execution to separate thread(s), which may or may not be efficient. Filesystem iteration is expensive, and may be even more expensive than the processing in func. In this case you may observe func to be called sequentially, when each call manages to complete before the iterator increment does.

Standard library implementation is permitted to ignore the execution policy argument and execute the algorithm serially. For example, the implementation may simply not bother parallelizing the function calls if the input sequence cannot be partitioned.

  • Related