Home > Software design >  How can I extract all Unique / Distinct Rows from a Datatable and save these rows in a new Datatable
How can I extract all Unique / Distinct Rows from a Datatable and save these rows in a new Datatable

Time:06-09

The Problem

I have a DataTable containing several columns and rows. One of these column names is "logon". Some Rows in this DataTable have the same Column "logon". For example, 2 Rows with a "logon" of test123. I need a new DataTable containing only one of these 2 Rows (which one doesn't matter).

The Question

How can I create a new DataTable from the old DataTable, filtering out Rows with a duplicate entry in the "logon" Column. The new DataTable should have the same structure / columns as the old one. In practice, after the filter is applied and some matches are found, only the Rows.Count changes.

What have I tried so far

  • Dim distinctDT As DataTable = myDT.DefaultView.ToTable(True, "logon") --> Gives me just a DataTable with one Column ("logon)". Although filtered for unique values, it only contains one Column --> If i pass in an array with all my Column names, it doesn't filter for unique values anymore.

  • Dim names = From row In myDataTable.AsEnumerable() Select row.Field(Of String)("Name") Distinct --> Gives me an array of strings with the unique values --> I need a DataTable with same columns as before

CodePudding user response:

A couple of options that allow to filter the DataRows of a DataTable, based on the value of a specific Column, to generate a new DataTable with the resulting DataRows.

Considering - since you mentioned it - that is not important which DataRow is selected, i.e., any duplicate DataRow would do:
(if the DataRow to select becomes important at some point, you could also OrderBy() the grouping using the value of another Column, then pick the first - or the last - DataRow from the ordered collection)

Group DataRows by the value of a Column:

  • Group the DataRows of the source DataTable using the value of a Column
  • Select the first DataRow of each grouping
  • Call the CopyToDataTable() method to generate a new DataTable

Resulting in:

Dim newDt = [DataTable].AsEnumerable().
    GroupBy(Function(r) r("[Column Name]")).
    Select(Function(g) g.First()).
    CopyToDataTable()

Use a custom EqualityComparer:

  • Build a simple EqualityComparer class that compares the values of the same Column of two DataRows objects
  • Use the Distinct() method and pass the custom EqualityComparer, initialized with the name of the Column used as comparer
  • Call the CopyToDataTable() method

This method has the advantage that is reusable (i.e., you don't need to rebuild a query, just initialize the comparer with the name of the Column to compare)

Resulting in:

Dim newDt = [DataTable].AsEnumerable().
Distinct(New DataRowColumnComparer("[Column Name]")).
CopyToDataTable()

Custom EqualityComparer:
It's kind of a basic comparer. You can of course extend it to use different indexers (an integer representing the index of a Column, or a DataColumn reference).

Public Class DataRowColumnComparer
    Implements IEqualityComparer(Of DataRow)

    Private ReadOnly t As String = String.Empty

    Public Sub New(key As String)
        If String.IsNullOrEmpty(key) Then Throw New ArgumentException("Empty key")
        t = key
    End Sub

    Public Overloads Function Equals(dr1 As DataRow, dr2 As DataRow) As Boolean Implements IEqualityComparer(Of DataRow).Equals
        If dr1 Is Nothing AndAlso dr2 Is Nothing Then Return True
        If dr1 Is Nothing OrElse dr2 Is Nothing Then Return False
        Return dr1(t).Equals(dr2(t))
    End Function

    Public Overloads Function GetHashCode(dr As DataRow) As Integer Implements IEqualityComparer(Of DataRow).GetHashCode
        If dr(t) Is Nothing OrElse dr(t) Is DBNull.Value Then Return 0
        Return dr(t).GetHashCode()
    End Function
End Class
  • Related