I made some C# functions to roughly "diff" 2 directories, similar to KDiff3.
First this function compares file names between directories. Any difference in file names implies a file has been added to dir1:
public static List<string> diffFileNamesInDirs(string dir1, string dir2)
{
List<string> dir1FileNames = Directory
.EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir1 "\\", "")
.ToList();
List<string> dir2FileNames = Directory
.EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir2 "\\", "")
.ToList();
List<string> diffs = dir1FileNames.Except(dir2FileNames).Distinct().ToList();
return diffs;
}
Second this function compares file sizes for file names which exist in both directories. Any difference in file size implies some edit has been made:
public static List<string> diffFileSizesInDirs(string dir1, string dir2)
{
//Get list of file paths, relative to the base dir1/dir2 directories
List<string> dir1FileNames = Directory
.EnumerateFiles(dir1, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir1 "\\", "")
.ToList();
List<string> dir2FileNames = Directory
.EnumerateFiles(dir2, "*", SearchOption.AllDirectories)
.Select(Path.GetFullPath)
.Select(entry => entry.Replace(dir2 "\\", "")
.ToList();
List<string> sharedFileNames = dir1FileNames.Intersect(dir2FileNames).Distinct().ToList();
//Get list of file sizes corresponding to file paths
List<long> dir1FileSizes = sharedFileNames
.Select(s =>
new FileInfo(dir1 "\\" s) //Create the full file path as required for FileInfo objects
.Length).ToList();
List<long> dir2FileSizes = sharedFileNames
.Select(s =>
new FileInfo(dir2 "\\" s) //Create the full file path as required for FileInfo objects
.Length).ToList();
List<string> changedFiles = new List<string>();
for (int i = 0; i < sharedFileNames.Count; i )
{
//If file sizes are different, there must have been a change made to one of the files.
if (dir1FileSizes[i] != dir2FileSizes[i])
{
changedFiles.Add(sharedFileNames[i]);
}
}
return changedFiles;
}
Lastly combining the results gives a list of all files which have been added/edited between the directories:
List<string> nameDiffs = FileIO.diffFileNamesInDirs(dir1, dir2);
List<string> sizeDiffs = FileIO.diffFileSizesInDirs(dir1, dir2);
List<string> allDiffs = nameDiffs.Concat(sizeDiffs).ToList();
This approach generally works but feels sloppy and also would fail for the "binary equal" case where a file is modified but still has the same size. Any suggestions on a better way?
CodePudding user response:
You could use System.Security.Cryptographie.MD5 to calculate MD5 for each file and compare these.
E.g. using this Method:
public static string GetMd5Hash(string path)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(path))
{
var hash = md5.ComputeHash(stream);
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
}
}
}
Maybe this takes a little bit more time than geting values from FileInfo (depends on the amount of file to compare), but you can be completely sure if files are binary identical.