I have a File-Info-List of more than 200 log-files from a directory. Most of the files need to be in the list, but there are a few lists that should be ignored.
Here is an example of the File-List:
- A300a1_ContentLink.log
- A301a20_ContentLink.log
- A1_4a0_ContentLink.log
- B200a101_ContentLink.log
- B200a101_ContentLink_20221208_115905.log
- B200a101_ContentLink_20221208_115907.log
- B200a101_ContentLink_20221208_120647.log
- B201a1_ContentLink.log
- B202a0_ContentLink.log
Explanation of the file name: The first chars refer to a room (e.g. room A300 or A1). A room could have any description, eg B200, CXS2 or only CDD, the next to a device-name (e.g. device a1 oder device a20). Each device starts with a, followed by 1-3 digits. Last part of each file is "_ContentLink" .
All files with further ending, like _202211208_115905 are duplicates of older versions, that are needed in other programs, but not in my List.
My problem is that I only need the newest File of each logfile in my File-Info-List.
I initialized a FileInfo[] allFiles that contains all of the files of the directory. Next I initialized a new FileInfo[] in which I would like to store only the newest version of each file.
My first attempt was to compare the LastWrite time
FileInfo currentFile = allFiles[0];
foreach (FileInfo file in allFiles)
{
if (file.LastWriteTime > currentFile.LastWriteTime)
{
currentFile = file;
}
}
But I only get back the latest file of the whole folder.
Now, I am thinking about to use Regular Expressions insteadt of .LastWriteTime, to exclude all Files that have a suffix after ContentLink.
But I don't know how and how to remove the outdated files from the list with all files (or transfer only the relevatn to a new File Info[]-List)
Thank you in advance for your ideas.
CodePudding user response:
You can use a LINQ query to:
- extract the name and time part from each file name
- group the files by name and
- select the latest (maximum) file by time
Something like :
var regex=new Regex("^(.*?)_ContentLink(.*?).log");
var latest=allFiles.Select(f=>{
var parts=regex.Match(f.Name);
return new {
File=f,
Name=parts.Groups[1].ToString(),
Date=parts.Groups[2].ToString()
};
})
.GroupBy(f=>f.Name)
.Select(g=>g.MaxBy(f=>f.Date).File)
.ToArray();
foreach(var file in latest)
{
Console.WriteLine(file.Name);
}
This produces
A300a1_ContentLink.log
A301a20_ContentLink.log
A1_4a0_ContentLink.log
B200a101_ContentLink_20221208_120647.log
B201a1_ContentLink.log
B202a0_ContentLink.log
MaxBy
was added in .NET 6. Before that you can use the equivalent method from the MoreLINQ library.
The regular expression captures the smallest possible string before _ContentLink
in the first group (.*?)
and the smallest possible date part in the second group.
You could get a bit fancier and use different regular expressions to capture the name and time part. Combined with local functions, this results in a somewhat cleaner query:
var nameRex=new Regex("^(.*?)_ContentLink.*.log");
var timeRex=new Regex("^.*_ContentLink(.*?).log");
string NamePart(FileInfo f)
{
return nameRex.Match(f.Name).Groups[1].ToString();
}
string TimePart(FileInfo f)
{
return timeRex.Match(f.Name).Groups[1].ToString();
}
var latest=allFiles
.GroupBy(NamePart)
.Select(g=>g.MaxBy(TimePart))
.ToArray();