Home > Mobile >  Better way to detect file differences between 2 directories?
Better way to detect file differences between 2 directories?

Time:01-14

I made some C# functions to roughly "diff" 2 directories, similar to KDiff3.

First this function compares file names between directories. Any difference in file names implies a file has been added to dir1:

public static List<string> diffFileNamesInDirs(string dir1, string dir2)
{
    List<string> dir1FileNames = Directory
       .EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
       .Select(Path.GetFullPath)
       .Select(entry => entry.Replace(dir1   "\\", "")
       .ToList();
    List<string> dir2FileNames = Directory
        .EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
        .Select(Path.GetFullPath)
        .Select(entry => entry.Replace(dir2   "\\", "")
        .ToList();
    List<string> diffs = dir1FileNames.Except(dir2FileNames).Distinct().ToList();

    return diffs;
}

Second this function compares file sizes for file names which exist in both directories. Any difference in file size implies some edit has been made:

public static List<string> diffFileSizesInDirs(string dir1, string dir2)
{
    //Get list of file paths, relative to the base dir1/dir2 directories
    List<string> dir1FileNames = Directory
       .EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
       .Select(Path.GetFullPath)
       .Select(entry => entry.Replace(dir1   "\\", "")
       .ToList();
    List<string> dir2FileNames = Directory
        .EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
        .Select(Path.GetFullPath)
        .Select(entry => entry.Replace(dir2   "\\", "")
        .ToList();
    List<string> sharedFileNames = dir1FileNames.Intersect(dir2FileNames).Distinct().ToList();

    //Get list of file sizes corresponding to file paths
    List<long> dir1FileSizes = sharedFileNames
        .Select(s => 
        new FileInfo(dir1   "\\"   s) //Create the full file path as required for FileInfo objects
        .Length).ToList();
    List<long> dir2FileSizes = sharedFileNames
        .Select(s =>
        new FileInfo(dir2   "\\"   s) //Create the full file path as required for FileInfo objects
        .Length).ToList();

    List<string> changedFiles = new List<string>();
    for (int i = 0; i < sharedFileNames.Count; i  )
    {
        //If file sizes are different, there must have been a change made to one of the files. 
        if (dir1FileSizes[i] != dir2FileSizes[i])
        {
            changedFiles.Add(sharedFileNames[i]);
        }
    }

    return changedFiles;
}

Lastly combining the results gives a list of all files which have been added/edited between the directories:

List<string> nameDiffs = FileIO.diffFileNamesInDirs(dir1, dir2);
List<string> sizeDiffs = FileIO.diffFileSizesInDirs(dir1, dir2);
List<string> allDiffs = nameDiffs.Concat(sizeDiffs).ToList();

This approach generally works but feels sloppy and also would fail for the "binary equal" case where a file is modified but still has the same size. Any suggestions on a better way?

CodePudding user response:

You could use System.Security.Cryptographie.MD5 to calculate MD5 for each file and compare these.

E.g. using this Method:

public static string GetMd5Hash(string path)
{
    using (var md5 = MD5.Create())
    {
        using (var stream = File.OpenRead(path))
        {
            var hash = md5.ComputeHash(stream);
            return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
        }
    }
}

Maybe this takes a little bit more time than geting values from FileInfo (depends on the amount of file to compare), but you can be completely sure if files are binary identical.

  • Related