Home > Mobile >  c# - Remove old Files from FileInfo list
c# - Remove old Files from FileInfo list

Time:12-08

I have a File-Info-List of more than 200 log-files from a directory. Most of the files need to be in the list, but there are a few lists that should be ignored.

Here is an example of the File-List:

  • A300a1_ContentLink.log
  • A301a20_ContentLink.log
  • A1_4a0_ContentLink.log
  • B200a101_ContentLink.log
  • B200a101_ContentLink_20221208_115905.log
  • B200a101_ContentLink_20221208_115907.log
  • B200a101_ContentLink_20221208_120647.log
  • B201a1_ContentLink.log
  • B202a0_ContentLink.log

Explanation of the file name: The first chars refer to a room (e.g. room A300 or A1). A room could have any description, eg B200, CXS2 or only CDD, the next to a device-name (e.g. device a1 oder device a20). Each device starts with a, followed by 1-3 digits. Last part of each file is "_ContentLink" .

All files with further ending, like _202211208_115905 are duplicates of older versions, that are needed in other programs, but not in my List.

My problem is that I only need the newest File of each logfile in my File-Info-List.

I initialized a FileInfo[] allFiles that contains all of the files of the directory. Next I initialized a new FileInfo[] in which I would like to store only the newest version of each file.

My first attempt was to compare the LastWrite time

            FileInfo currentFile = allFiles[0];

            foreach (FileInfo file in allFiles)
            {
                if (file.LastWriteTime > currentFile.LastWriteTime)
                {
                    currentFile = file;
                }
            }

But I only get back the latest file of the whole folder.

Now, I am thinking about to use Regular Expressions insteadt of .LastWriteTime, to exclude all Files that have a suffix after ContentLink.

But I don't know how and how to remove the outdated files from the list with all files (or transfer only the relevatn to a new File Info[]-List)

Thank you in advance for your ideas.

CodePudding user response:

You can use a LINQ query to:

  • extract the name and time part from each file name
  • group the files by name and
  • select the latest (maximum) file by time

Something like :

var regex=new Regex("^(.*?)_ContentLink(.*?).log");
    
var latest=allFiles.Select(f=>{ 
                             var parts=regex.Match(f.Name);
                             return new {
                                 File=f,
                                 Name=parts.Groups[1].ToString(),
                                 Date=parts.Groups[2].ToString()
                             };
                         })
              .GroupBy(f=>f.Name)
              .Select(g=>g.MaxBy(f=>f.Date).File)
              .ToArray();

foreach(var file in latest)
{
    Console.WriteLine(file.Name);
}

This produces

A300a1_ContentLink.log
A301a20_ContentLink.log
A1_4a0_ContentLink.log
B200a101_ContentLink_20221208_120647.log
B201a1_ContentLink.log
B202a0_ContentLink.log

MaxBy was added in .NET 6. Before that you can use the equivalent method from the MoreLINQ library.

The regular expression captures the smallest possible string before _ContentLink in the first group (.*?) and the smallest possible date part in the second group.

You could get a bit fancier and use different regular expressions to capture the name and time part. Combined with local functions, this results in a somewhat cleaner query:

    var nameRex=new Regex("^(.*?)_ContentLink.*.log");
    var timeRex=new Regex("^.*_ContentLink(.*?).log");
    
    string NamePart(FileInfo f)
    {
        return nameRex.Match(f.Name).Groups[1].ToString();
    }

    string TimePart(FileInfo f)
    {
        return timeRex.Match(f.Name).Groups[1].ToString();
    }
    
    var latest=allFiles
              .GroupBy(NamePart)
              .Select(g=>g.MaxBy(TimePart))            
              .ToArray();
  • Related