Home > Back-end >  What is the optimal way to filter rows from a DataTable?
What is the optimal way to filter rows from a DataTable?

Time:01-13

I'm looking for the easiest and fastest way to get DataRows from a DataTable. These DataRows have several columns including one with Integer and I only want the rows with the highest value.

Currently i get this result like this :

Dim maxValue = 0
For Each row In mDataTable.Rows
If row.valueCell > maxValue Then
    maxValue = row.valueCell
    End If
Next
Dim mDataTableBis = mDataTable.Clone
For Each row In mDataTable.Select("value = " & valueCell)
    mDataTableBis.ImportRow(row)
Next

Original DataTable (for example):

Rows letters value
row 1 (wanted) x 4
row 2 y 2
row 3 (wanted) z 4

CodePudding user response:

If you prefer the for-each loop approach, like it seems from the code you posted, use this function:

Public Sub ForEachLoop()

    Dim maxValue As Integer

    ' Loop through rows to find max value
    For Each row As DataRow In mDataTable.Rows
        Dim currentRowValue As Integer = row.Field(Of Integer)(numbersColumn)
        If currentRowValue > maxValue Then
            maxValue = currentRowValue
        End If
    Next

    ' Create a List of DataRow
    Dim res As New List(Of DataRow)

    ' Loop through rows again to add to list each row in which numbersColumn field = maxValue
    For Each row As DataRow In mDataTable.Rows
        If row.Field(Of Integer)(numbersColumn) = maxValue Then
            res.Add(row)
        End If
    Next

    ' Create results DataTable copying the List to a new DataTable
    Dim result As DataTable = res.CopyToDataTable()

End Sub

If you would like something more concise, you can try with LINQ but this is a more advanced solution, surely not beginner-friendly.

Something like:

Public Sub LINQ()

    ' Find the row with highest value in numbersColumn, then get the value from the field
    Dim maxValue As Integer = mDataTable.AsEnumerable().MaxBy(Function(x) x.Field(Of Integer)(numbersColumn)).Field(Of Integer)(numbersColumn)

    ' Select all the rows with numbersColumn value = maxValue (this returns an IEnumerable of DataRow)
    Dim resultsRows As IEnumerable(Of DataRow) = mDataTable.AsEnumerable().Where(Function(x) x.Field(Of Integer)(numbersColumn) = maxValue)

    ' Create results DataTable copying the IEnumerable to a new DataTable
    Dim result As DataTable = resultsRows.CopyToDataTable()

End Sub

You could also use the DataTable.Select() method...

Public Sub DataTableSelect()

    Dim result As DataTable = mDataTable.Select("numbersColumn = max(numbersColumn)").CopyToDataTable()

End Sub

...but this is significally slower than a for each loop:

Benchmark

As you can see, the for each version is around 3500 times faster than the Select() version - that's a big difference! This test has been executed on a relatively small DataTable with around 10k rows - imagine how big the difference could be in a million-rows-DataTable.

  • Related