Home > OS >  list of files in all the directories using std::copy and async tasks in C
list of files in all the directories using std::copy and async tasks in C

Time:02-01

This is a program that does a directory tree listing using asynchronous tasks in C .

my problem is in each function call the variable 'vect' is created as a local variable and in each function call, we have a list of files in a directory but at the end all the files in all the directories are returned into the main! how is it possible?

I mean how come the 'vect' variable which is a local variable to each function call, keeps the file name of each directory generated by a separate function call?! this 'vect' acts like it is a global variable. Is it because of "std::copy"? I don't understand it!

#include <algorithm>
#include <filesystem>
#include <future>
#include <iostream>
#include <vector>

typedef std::vector<std::filesystem::directory_entry> vectDirEntry;

vectDirEntry ListDirectory2(std::filesystem::directory_entry&& dirPath)
{
    std::vector<std::future<std::vector<std::filesystem::directory_entry>>> finalVect;
    vectDirEntry vect;

    for (const std::filesystem::directory_entry& entry : std::filesystem::directory_iterator(dirPath))
    {
        if (entry.is_directory())
        {

            
            std::future<vectDirEntry> fut = std::async(std::launch::async, &ListDirectory2, entry);
            finalVect.push_back(std::move(fut));
        }
        else if (entry.is_regular_file())
        {

            vect.push_back(entry);

        }
    }

    std::for_each(finalVect.begin(), finalVect.end(), [&](std::future<std::vector<std::filesystem::directory_entry>>& fut)

        {
            vectDirEntry lst = fut.get();
            std::copy(lst.begin(), lst.end(), std::back_inserter(vect));
            
        }

    );
    return vect;
}


int main()
{

    const std::filesystem::directory_entry root = std::filesystem::directory_entry("C:/Test");
    std::future<std::vector<std::filesystem::directory_entry>> fut = std::async(std::launch::async, &ListDirectory2, root);
    auto result = fut.get();

    for (std::filesystem::directory_entry& item : result)
    {

        std::cout << item << '\n';

    }
}

CodePudding user response:

There is a separate vect for each recursive call. But you return it, and the future generated from std::async provides the vect from each call. When you do:

        vectDirEntry lst = fut.get();
        std::copy(lst.begin(), lst.end(), std::back_inserter(vect));

for each of the std::async dispatched futures, you consume their vects to populate the parent's vect (which it in turn returns).

The lst in that code is the vect returned by one of your recursive calls. The vect in that std::copy is the vect from the current ListDirectory2 call, implicitly received by reference (because you began the lambda definition with [&], which means any variables referenced that are not declared within the lambda are implicitly references to the variables in the outer scope).

There's nothing unusual here; you explicitly copied from the sub-vects into the parent vect before returning each time, eventually building up a final vect in the top-most ListDirectory2 call that contains the results from every recursive call.

As a side-note, you're performing a number of copies that aren't strictly necessary. You could avoid at least some of them by replacing your use of std::copy with std::move (in addition to the single argument version that makes an r-value reference from an r-value, there's a three-arg version equivalent to std::copy that moves from the source; since the lst argument expires at the end of each function call, there's no harm in emptying it). A similar change could be make using the insert method of vect and std::make_move_iterator (and might be slightly faster by allowing the vector to resize in bulk up-front for each bulk move), but the simple swap from std::copy to std::move is the minimalist solution and it should be fast enough.

CodePudding user response:

What you observe has nothing to do with async calls but is due to recursion.

Here's a flowchart describing it for 3 directory levels. Each vect is here given a unique name (and they are unique instances in the program).

ListDirectory2(dir)
vect <- file1.1   // put all files in dir in the local vect
        file1.2
dir1 ---------------> ListDirectory2(dir1) // call ListDirectory2 for each dir
                      vect1 <- file1.1 // put all files in dir1 in the local vect
                               file1.2
                      dir1.1 ---------------> ListDirectory2(dir1.1)
                                              ...
                      vect1 <- std::copy <--- return vect1.1
                      dir1.2 ---------------> ListDirectory2(dir1.2)
                                              ...
                      vect1 <- std::copy <--- return vect1.2
vect <- std::copy <-- return vect1

dir2 ---------------> ListDirectory2(dir2)
                      vect2 <- file2.1 // put all files in dir2 in the local vect
                               file2.2
                      dir2.1 ---------------> ListDirectory2(dir2.1)
                                              ...
                      vect2 <- std::copy <--- return vect2.1
                      dir2.2 ---------------> ListDirectory2(dir2.2)
                                              ...
                      vect2 <- std::copy <--- return vect2.2
vect <- std::copy <-- return vect2
return vect

When the call returns to main, vect will therefore be populated with all the files encountered from the starting directory and down.

  • Related