Home > Enterprise >  Compare two LINQ queries and get differences in a datagridview (C#)
Compare two LINQ queries and get differences in a datagridview (C#)

Time:04-30

I have two C# query results (with the same properties). I want to compare them and show the differences (added / removed items and also modified properties) in a DataGridview. Something like this:

enter image description here

What's the best approach to achieve that?

CodePudding user response:

The following may be a possible approach (not claiming it's the best approach).

Assuming that your class is implemented as follows:

public class Item
{
    public int Id { get; set; }
    public int Weight { get; set; }
    public int Phase { get; set; }
}

You could implement a comparison class that takes the previous item (which may be null) and the current item (which may be null) to be compared:

public class ItemComparison
{
    public int Id { get; init; }
    
    public string Weight => GetComparisonString(_weightPrevious, _weightCurrent);
    public string Phase => GetComparisonString(_phasePrevious, _phaseCurrent);
    
    public ItemComparison(Item previousItem, Item currentItem)
    {
        // TODO Error handling: previousItem and currentItem may
        //  - both be null
        //  - both be not null and have different Id values
        
        Id = (previousItem ?? currentItem).Id;
        
        _weightPrevious = previousItem?.Weight;
        _weightCurrent = currentItem?.Weight;
        _phasePrevious = previousItem?.Phase;
        _phaseCurrent = currentItem?.Phase;
    }
    
    private int? _weightPrevious;
    private int? _weightCurrent;
    private int? _phasePrevious;
    private int? _phaseCurrent;
    
    private string GetComparisonString(int? previousValue, int? currentValue)
    {
        if (previousValue == currentValue)
        {
            return "not changed";
        }
        
        return $"{GetStringOrUnknown(previousValue)} --> {GetStringOrUnknown(currentValue)}"; 
    }
    
    private string GetStringOrUnknown(int? value)
    {
        return value.HasValue ? value.ToString() : "?";
    }
}

and compute the updated items, removed items and added items as follows:

//using System.Collections.Generic;
//using System.Linq;

var updatedItems = queryResult1
    .Join(queryResult2,
        previous => previous.Id,
        current => current.Id,
        (previous, current) => new ItemComparison(previous, current));

var removedItems = queryResult1
    .ExceptBy(queryResult2.Select(qr2 => qr2.Id), qr1 => qr1.Id)
    .Select(removed => new ItemComparison(removed, null));

var addedItems = queryResult2
    .ExceptBy(queryResult1.Select(qr1 => qr1.Id), qr2 => qr2.Id)
    .Select(added => new ItemComparison(null, added));

Lastly, the item comparisons can be concatenated to produce the final comparison result:

List<ItemComparison> comparisonResult = updatedItems
    .Concat(removedItems)
    .Concat(addedItems)
    .OrderBy(item => item.Id)
    .ToList();

comparisonResult can then be used to populate your DataGridView.

Example fiddle here.

CodePudding user response:

Assuming that ID never changes, my advise would be to do a full outer join on ID

  • Added elements are elements that were not in query 1, but are in query 2
  • Removed elements are elements that were in query 1 but are not in query 2 anymore
  • Changed elements are elements that were in both query 1 and 2, but are not equal by value.

Alas, class Enumerable doesn't have an extension method for full outer join. Luckily it is fairly easy to create one. If you are not familiar with extension methods, consider to read Extension Methods Demystified (another good reason to use method syntax instead of query syntax)

FullOuterJoin

public static IEnumerable<TResult> FullOuterJoin<T1, T2, TKey, TResult>(
    this IEnumerable<T1> sequence1,
    IEnumerable<T2> sequence2,
    Func<T1, TKey> joinKey1Selector,
    Func<T2, TKey> joinKey2Selector,
    Func<T1, T2, TKey, TResult> resultSelector,
    IEqualityComparer<TKey> keyComparer)
{
    // TODO: implement
}

Just like a lot of LINQ methods, you could help the users of your method by writing several overloads, for instance, one without keyComparer:

public static IEnumerable<TResult> FullOuterJoin<T1, T2, TKey, TResult>(
    this IEnumerable<T1> sequence1,
    IEnumerable<T2> sequence2,
    Func<T1, TKey> joinKey1Selector,
    Func<T2, TKey> joinKey2Selector,
    Func<T1, T2, Tkey, TResult> resultSelector)
{
    return FullOuterJoin(sequence1, sequence2,
                         joinKey1Selector, joinKey2Selector,
                         resultSelector, null);
}

Usage of FullOuterJoin in your problem

Usage could be as follows:

class QueryResult
{
    public int Id {get; set;}
    public decimal Weight {get; set;}           // maybe other type
    public int Phase {get; set;}
}

IEnumerable<QueryResult> query1 = ...
IEnumerable<QueryResult> query2 = ...

// full outer join query1 and query 2 on Id:
var fullOuterJoin = query1.FullOuterJoin(query2,

    queryResult => queryResult.Id,    // from every element from query1 take the Id
    queryResult => queryResult.Id,    // from every element from query2 take the Id

    // parameter resultSelector: from every T1 and its matching T2 make one new
    // note: T1 or T2 can be null (but not both)
    (x, y, key) => new
    {
        Id = key,
        Added = T1 == null,
        Removed = T2 == null,
        Changed = T1 != T2,

        Original = T1,
        Current = T2,
    };

Detect Changes

I think it would be neater to change the properties Added / Removed / Changed into an enum. Create a method for this:

enum ChangeState {Unchanged, Added, Removed, Changed};

ChangeState DetectChange<T>(T x, T y)
{
    return DetectChange(x, y, null); // call the overload with comparer
}

ChangeState DetectChange<T>(T x, T y, IEqualityComparer<T> comparer)
{
    if (comparer == null) comparer = EqualityComparer<T>.Default;

    if (comparer.Equals(x, y)) return ChangeState.Unchanged;
    if (x == null) return ChangeState.Added;   // because y not null
    if (y == null) return ChangeState.Removed; // because x not null
    return ChangeState.Changed;
}

Parameter resultSelector will be like this:

(x, y, key) => new
{
    Id = key,
    ChangeState = DetectChange(x, y),
    ...

Implementation of Full Outer Join

The implementation is fairly simple:

  • Create Dictionaries for sequence1 and sequence2, using TKey as key.
  • Get all distinct Keys from both LookupTables
  • For every Key, get the element from lookup X and lookup Y. One of these two might be null.
  • Use resultSelector to calculate the result
  • Yield return the result

.

public static IEnumerable<TResult> FullOuterJoin<Tx, Ty, TKey, TResult>(
    this IEnumerable<Tx> sequenceX,
    IEnumerable<Ty> sequenceY,
    Func<Tx, TKey> xKeySelector,
    Func<Ty, TKey> yKeySelector,
    Func<Tx, Ty, TKey, TResult> resultSelector,
    IEqualityComparer<TKey> keyComparer)
{
    // TODO: throw exception if any of sequenceX, sequenceY,
    // KeySelectors or resultSelector equal null

    // if keyComparer equals null, use default comparison technique
    if (keyComparer == null) keyComparer = EqualityComparer<TKey>.Default;

    // create two lookupTables:
    IDictionary<TKey, Tx> dictX = sequence1.ToDictionary(x => joinKey1Selector(x), keyComparer);
    IDictionary<TKey, Ty> dictY = sequence1.ToDictionary(y => joinKey2Selector(y), keyComparer);

    // get all used Tkey:
    IEnumerable<TKey> keysX= dictX.Select(x => x.Key);
    IEnumerable<TKey> keysY= dictY.Select(y => y.Key);
    IEnumerable<TKey> allUsedKeys = keysX.Union(keyY, keyComparer);

    // for every used key, get the x and the y and return a result
    foreach(TKey key in allUsedKeys)
    {
        dictX.TryGetValue(key, out Tx foundX);  // null if not found
        dictY.TryGetValue(key, out Ty foundY);
        TResult result = resultSelector(foundX, foundY, key);
        yield return result;
    }
}

Problem if IQueryable

Problem: the simple extension method for full outer join does not work with IQueryable, only with IEnumerable.

If you really want that your Database Management System does the full outer join as Queryable, before returning the data to your local process, you'll have to create an extension method that has IQueryable<...> as input. Maybe this article about Left Outer Join as IQueryable is a good starting point.

If I look at the length of the proposed Left Outer Join, I think that it won't be very efficient. Consider to add a method that does the full outer join as SQL statement to your DbContext.

  • Related