This is a program that does a directory tree listing using asynchronous tasks in C .
my problem is in each function call the variable 'vect' is created as a local variable and in each function call, we have a list of files in a directory but at the end all the files in all the directories are returned into the main! how is it possible?
I mean how come the 'vect' variable which is a local variable to each function call, keeps the file name of each directory generated by a separate function call?! this 'vect' acts like it is a global variable. Is it because of "std::copy"? I don't understand it!
#include <algorithm>
#include <filesystem>
#include <future>
#include <iostream>
#include <vector>
typedef std::vector<std::filesystem::directory_entry> vectDirEntry;
vectDirEntry ListDirectory2(std::filesystem::directory_entry&& dirPath)
{
std::vector<std::future<std::vector<std::filesystem::directory_entry>>> finalVect;
vectDirEntry vect;
for (const std::filesystem::directory_entry& entry : std::filesystem::directory_iterator(dirPath))
{
if (entry.is_directory())
{
std::future<vectDirEntry> fut = std::async(std::launch::async, &ListDirectory2, entry);
finalVect.push_back(std::move(fut));
}
else if (entry.is_regular_file())
{
vect.push_back(entry);
}
}
std::for_each(finalVect.begin(), finalVect.end(), [&](std::future<std::vector<std::filesystem::directory_entry>>& fut)
{
vectDirEntry lst = fut.get();
std::copy(lst.begin(), lst.end(), std::back_inserter(vect));
}
);
return vect;
}
int main()
{
const std::filesystem::directory_entry root = std::filesystem::directory_entry("C:/Test");
std::future<std::vector<std::filesystem::directory_entry>> fut = std::async(std::launch::async, &ListDirectory2, root);
auto result = fut.get();
for (std::filesystem::directory_entry& item : result)
{
std::cout << item << '\n';
}
}
CodePudding user response:
There is a separate vect
for each recursive call. But you return it, and the future generated from std::async
provides the vect
from each call. When you do:
vectDirEntry lst = fut.get();
std::copy(lst.begin(), lst.end(), std::back_inserter(vect));
for each of the std::async
dispatched futures, you consume their vect
s to populate the parent's vect
(which it in turn returns).
The lst
in that code is the vect
returned by one of your recursive calls. The vect
in that std::copy
is the vect
from the current ListDirectory2
call, implicitly received by reference (because you began the lambda definition with [&]
, which means any variables referenced that are not declared within the lambda are implicitly references to the variables in the outer scope).
There's nothing unusual here; you explicitly copied from the sub-vect
s into the parent vect
before returning each time, eventually building up a final vect
in the top-most ListDirectory2
call that contains the results from every recursive call.
As a side-note, you're performing a number of copies that aren't strictly necessary. You could avoid at least some of them by replacing your use of std::copy
with std::move
(in addition to the single argument version that makes an r-value reference from an r-value, there's a three-arg version equivalent to std::copy
that moves from the source; since the lst
argument expires at the end of each function call, there's no harm in emptying it). A similar change could be make using the insert
method of vect
and std::make_move_iterator
(and might be slightly faster by allowing the vector to resize in bulk up-front for each bulk move), but the simple swap from std::copy
to std::move
is the minimalist solution and it should be fast enough.
CodePudding user response:
What you observe has nothing to do with async
calls but is due to recursion.
Here's a flowchart describing it for 3 directory levels. Each vect
is here given a unique name (and they are unique instances in the program).
ListDirectory2(dir)
vect <- file1.1 // put all files in dir in the local vect
file1.2
dir1 ---------------> ListDirectory2(dir1) // call ListDirectory2 for each dir
vect1 <- file1.1 // put all files in dir1 in the local vect
file1.2
dir1.1 ---------------> ListDirectory2(dir1.1)
...
vect1 <- std::copy <--- return vect1.1
dir1.2 ---------------> ListDirectory2(dir1.2)
...
vect1 <- std::copy <--- return vect1.2
vect <- std::copy <-- return vect1
dir2 ---------------> ListDirectory2(dir2)
vect2 <- file2.1 // put all files in dir2 in the local vect
file2.2
dir2.1 ---------------> ListDirectory2(dir2.1)
...
vect2 <- std::copy <--- return vect2.1
dir2.2 ---------------> ListDirectory2(dir2.2)
...
vect2 <- std::copy <--- return vect2.2
vect <- std::copy <-- return vect2
return vect
When the call returns to main
, vect
will therefore be populated with all the files encountered from the starting directory and down.