I have 2 DataTables: 1st DataTable is created today and 2nd Datatable was created yesterday. Each DataTable contains over 100K rows and 30 columns. I have a unique column - "Master_ID"
I want to compare and subtract (Today's DataTable - Yesterday's DataTable) and get the rows that are updated, deleted, and newly created. I want to perform a complete row-to-row comparison.
Output -> Dictionary <string, List DataRow >
- Dictionary <"New Data Rows", List DataRow >
- Dictionary <"Updated Data Rows", List DataRow >
- Dictionary <"Deleted Data Rows", List DataRow >
I am looking for a time-efficient approach, probably using LINQ.
CodePudding user response:
Probably this code will help you. It takes each row in the new DataTable
and looks for it in the old DataTable
(to get the created and updated rows) and then takes all rows in the old table which weren't found in new table (to get deleted rows). It will work very slowly if your tables contain over 100K rows, so you'd better put it into a separate thread.
My code will treat two different rows as one updated row if their Master_ID
column is the same.
DataTable oldTable, newTable; // Put there your tables
Dictionary<int, DataRow> deletedRows, newRows = new Dictionary<int, DataRow>(); // Here will be the result (key — row's Master_ID, value — the row)
var updatedRows = new Dictionary<int, Tuple<DataRow, DataRow>>(); // Here will be the result (key — row's Master_ID, value — tuple (first item — old row version, second item — new row version))
var commonIds = new List<int>();
foreach (var row in newTable.Rows) {
var id = row["Master_ID"];
if (oldTable.Contains(id)) {
commonIds.Add((int)id);
var rowInOldTable = oldTable.Rows.Find(id);
foreach (var column in oldTable.Columns) {
if (row[column] != rowInOldTable[column]) {
updatedRows.Add((int)id, Tuple.Create<DataRow, DataRow>(rowInOldTable, row));
break;
}
}
} else {
newRows.Add((int)id, row);
}
}
deletedRows = (from row in oldTable.Rows
where !commonIds.Contains((int)row["Master_ID"]))
.ToDictionary<DataRow, int>(row => (int)row["Master_ID"]);
CodePudding user response:
Probably one of the most efficient approaches is to use a dictionary-like collection to find out if a row-identifier is new, i'd use ILookup<TKey, TValue>
:
public static Dictionary<string, List<DataRow>?> GetChanges(DataTable dtNew, DataTable dtOld, string masterKeyName, StringComparer masterKeyComparer = null)
{
IEqualityComparer<string> comparer = masterKeyComparer ?? StringComparer.Ordinal;
ILookup<string, DataRow> newKeyLookup = dtNew.AsEnumerable().ToLookup(r => r.Field<string>(masterKeyName), comparer!)!;
ILookup<string, DataRow> oldKeyLookup = dtOld.AsEnumerable().ToLookup(r => r.Field<string>(masterKeyName), comparer!)!;
List<DataRow> updatedRows = new();
List<DataRow> newRows = new();
List<DataRow> deletedRows = new();
foreach (var x in newKeyLookup)
{
List<DataRow> updates = oldKeyLookup[x.Key].ToList();
updatedRows.AddRange(updates);
if (!updates.Any())
{
newRows.AddRange(x);
}
}
foreach (var x in oldKeyLookup)
{
if (!newKeyLookup[x.Key].Any())
{
deletedRows.AddRange(x);
}
}
return new Dictionary<string, List<DataRow>?>
{
{"New Data Rows", newRows},
{"Updated Data Rows", updatedRows},
{"Deleted Data Rows", deletedRows},
};
}