I am trying to LINQ query a set of files where I can find the file names with a specific string in them.
I was using:
var docs = directory.enumerateFiles(searchFolder, "* " strNumber "*", SearchOption.AllDirectories);
That was working fine, but some of my file searches were taking 30 minutes due to the fact that one of the directories has 1 million files. I was hoping to speed up the search process with a PLINQ query. However, while my syntax is good, I'm not getting the results I would expect. It looks like my problem may be in the Where statement. Any help would be helpful.
foreach (strNumber in strNumbers)
{
DirectoryInfo searchDirectory = new DirectoryInfo(searchFolder);
IEnumerable<System.IO.FileInfo> allDocs = searchDirectory.EnumerateFiles("*", SearchOPtion.AllDirectories);
IEnumerable<System.IO.FileInfo> docsToProcess = strNumbers
.SelectMany(strNumber => allDocs
.Where(file => file.Name.Contains(strNumber)))
.Distinct();
}
Any help would be much appreciated.
CodePudding user response:
I would change the order of the problem.
- Create a list of all files (into memory)
- Perform the search over the memory list
Then, you can use a Parallel Foreach over the memory array and your disk usage is limited to the initial search.
var searchDirectory = new DirectoryInfo(searchFolder);
var allDocs = searchDirectory.EnumerateFiles("*", SearchOPtion.AllDirectories).ToArray();
// For extra points, use a Parallel.ForEach here for multi-threaded work
Parallel.Foreach(strNumbers, strNumber =>
{
// Work on allDocs here, it should be in memory
});