Home > front end >  Linq Except not giving desired results in C#, datatable
Linq Except not giving desired results in C#, datatable

Time:02-10

I have two DataTables. I applied the Except operator as follows, and got either unfiltered or undesired results.

resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable()).CopyToDataTable();

Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?

Note: Both DataTables are plain simple with just one column.

  • dtA contains a dozen rows of strings.
  • dtB contains thousands of rows of strings.

I also tried the same syntax with another use case the set operator, Intersect. This does not work either.

resultDataTable2 =dtA.AsEnumerable().Intersect(dtB.AsEnumerable()).CopyToDataTable();

CodePudding user response:

Except will use default comparer ie. it will compare references.

I think you are expecting to filter result and comparison is based on members.

I will recommend you to implement your own IEqualityComparer to compare two objects based on member.

e.g.


resultDataTable = dtA.AsEnumerable().Except(dtB.AsEnumerable(), new TestComparer()).CopyToDataTable();

class TestComparer : IEqualityComparer<MyTestClass>
{
    public bool Equals(MyTestClass b1, MyTestClass b2)
    {
        if (b2 == null && b1 == null)
           return true;
        else if (b1 == null || b2 == null)
           return false;
        else if(b1.Prop1 == b2.Prop1 && b1.Prop2 == b2.Prop2) // ToDo add more check based on class
            return true;
        else
            return false;
    }

    public int GetHashCode(MyTestClass)
    {
        int hCode = MyTestClass.Height ^ MyTestClass.Length ^ ....; // Add more based on class properties
        return hCode.GetHashCode();
    }
}

Doc https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.except?view=net-6.0#system-linq-enumerable-except-1(system-collections-generic-ienumerable((-0))-system-collections-generic-ienumerable((-0))-system-collections-generic-iequalitycomparer((-0)))

CodePudding user response:

Could anyone please kindly explain to me why Except(dtB.AsEnumerable()) is not the way to put it?

When you do a.Except(b) the contents of b are loaded into a hash set. A hash set is a device that doesn't accept duplicates, so it returns false when you try to add something that is already there inside it.

After b is loaded into the hash set, then a is looped over, also being added to the hash set. Anything that adds successfully (set.Add returns true because it is not already there) is returned. Anything that is already there (set.Add returns false because it was added by being in b, or appearing earlier in a) is not returned. You should note that this process also dedupes a, so 1,1,2,3 except 2,3 would return just a single 1. You've achieved "every unique thing in a that isn't in b" - but do check whether you wanted a to be deduped too

A hash set is a wonderful thing, that enables super fast lookups. To do this it relies on two methods that every object has: GetHashcode and Equals. GetHashcode converts an object into a probably-unique number. Equals makes absolutely sure an object A equals B. Hash set tracks all the hashcodes and objects it's seen before, so when you add something it first gets the hashcode of what youre trying to add.. If it never saw that hashcode before it adds the item. If you try add anything that has the same hashcode as something it saw already, it uses Equals to check whether or not it's the same as what it saw it already (sometimes hashcodes are the same for different data) and adds the item if Equals declares it to be different. This whole operation is very fast, much faster than searching object by object through all the objects it saw before.

By default any class in C# gets its implementation of GetHashcode and Equals from object. Object's versions of these methods essentially return the memory address for GetHashcode, and compare the memory addresses for Equals

This works fine for stuff that really is at the same memory address:

var p = new Person(){Name="John"};
var q = p;                         //same mem address as p

But it doesn't work for objects that have the same data but live at different memory addresses:

var p = new Person(){Name="John"};
var q = new Person(){Name="John"}; //not same mem address as p

If you define two people as being equal if they have the same name, and you want C# to consider them equal in the same way, you have to instruct C# to compare the names, not the memory addresses.


A DataRow is like Person above: just because it has the same data as another DataRow, doesn't mean it's the same row in C#'s opinion. Further, because a single DataRow cannot belong to two datatables, it's certain that the "John" in row 1 of dtA, is a different object to the "John" in row 1 of dtB..

By defaul Equals returns false for these two data rows, so Except will never consider them equal and remove dtA's John because of the presence of John in dtB..

..unless you provide an alternative comparison strategy that overrides C#s default opinion of equality. That might look like:

  • provide a comparer, like Kalpesh's answer, typos on the class name aside),
  • override Equals/GetHashcode for the datarows so they work off column data, not memory addresses,
  • or use some other thing that does already have Equals and GetHashcode that work off data rather than memory addresses

As these are just datatables of a single column full of strings they're notionally not much more than an array of string. If we make them into an array of strings, when we do a.Except(b) we will be comparing strings. By default C#s opinion of whether one string equals another is based on the data content of the string rather than the memory address it lives at1, so you can either use string arrays/lists to start with or convert your dtA/B to a string array:

var arrA = dtA.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var arrB = dtB.Rows.Cast<DataRow>().Select(r => r[0] as string).ToArray();
var result = arra.Except(arrB);

Techncially we don't even need to call ToArray()..

If you really need the result to be a datatable, make one and add all the strings to it:

var resultDt = new DataTable();
resultDt.Columns.Add("x");
foreach(var s in result)
  resultDt.Rows.Add(s);

1: we'll ignore interning for now

  • Related