Home > Mobile >  Excel Data Verification Assistance
Excel Data Verification Assistance

Time:05-17

I'm faced with an interesting data validation challenge, every month my manager would perform a data verification activity on up 20k rows of data (this is entirely manual) where she looks for duplicate values in one column (EventID), if there is a duplicate compare the value of another column (subType) if the values are not the same. Both rows are copied to a different sheet for a secondary process.

I am still learning VBA, so I took this as a growing activity as it will give my manager a lot of time back in her day.

I have the code I am working on currently but I can see that I have started to go down the wrong path, is anyone able to assist?

Sub Find_changes ()

Dim eventID As Range
Dim subtype As Range
Dim cell As Range
Dim LastRow As Long
Dim Reader As Worksheet
Dim Writer As Worksheet

Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
Set eventID = Reader.Range("b:b")
Set subtype = Reader.Range("j:j")

Let LastRow = Writer.Cells(Rows.Count, 1).End(xlUp).Row   1

For Each cell In eventID
    If eventID = eventID And subtype <> subtype Then
        cell.EntireRow.Copy Destination:=Writer.Range(LastRow)
                
    End If
 Next
    
End Sub

I have also included a mock data set below:

Mock Data Set

Ideally what I would like to achieve (using the picture as an example) is to:

  • Read Column B (EventID) to identify duplicates
  • On duplicate (824466) compare values column J (Report SubType)
  • If values are different (SubType 1 and SubType 2 is this example)
  • Copy both rows to sperate sheet

CodePudding user response:

Please, try the next way which should be fast enough for larger ranges. It uses a Scripting Dictionary to find duplicates and place a marker in a column outside the used range, based on what a range is created (using SpecialCells(xlCellTypeConstants)) which will be copied at once, at the end:

Sub Find_changes()
 Dim Reader As Worksheet, LastRow As Long, Writer As Worksheet, rngCopy As Range
 Dim arrID, arrType, arrMark, i As Long, maxCol As Long, maxRow As Long, dict As Object

 Set Reader = ThisWorkbook.Worksheets(2)
 Set Writer = ThisWorkbook.Worksheets(3)
 LastRow = Writer.Range("B" & Writer.rows.count).End(xlUp).Row   1

 maxRow = Reader.Range("B" & Reader.rows.count).End(xlUp).Row
 maxCol = Reader.UsedRange.Columns.count   20

 arrID = Reader.Range("B1:B" & maxRow).value    'plase the range in an array for faster iteration
 arrType = Reader.Range("J1:J" & maxRow).value 'plase the range in an array for faster iteration
 ReDim arrMark(1 To maxRow, 1 To 1)

 Set dict = CreateObject("Scripting.Dictionary")
 'Place the duplicates in a dictionary and mark their rows:
 For i = 1 To maxRow
    If Not dict.Exists(arrID(i, 1)) Then
        dict.Add arrID(i, 1), Array(arrType(i, 1), i)
    Else
        If dict(arrID(i, 1))(0) <> arrType(i, 1) Then
            arrMark(i, 1) = "Copy"                         'write "Copy" in arrMark
            arrMark(dict(arrID(i, 1))(1), 1) = "Copy" 'write "Copy" in arrMark
        End If
    End If
 Next

 'drop the marker array after the last column:
 Reader.cells(1, maxCol).Resize(UBound(arrMark), 1).value = arrMark

 'Extract the range to be copied (at once)
 On Error Resume Next
  Set rngCopy = Reader.Range(Reader.cells(1, maxCol), Reader.cells(maxRow, maxCol)).SpecialCells(xlCellTypeConstants)
 On Error GoTo 0
 If Not rngCopy Is Nothing Then
    rngCopy.ClearContents
    rngCopy.EntireRow.Copy Writer.Range("A" & LastRow)
 End If
 MsgBox "Ready..."
End Sub

I tried commenting the code lines, but if something not clear enough, please do not hesitate to ask for clarification.

CodePudding user response:

For large numbers of cells, it's often beneficial to store ranges as arrays, and work with them in VBA memory rather than working with them straight in Excel. You'll find that working with data in VBA memory is much faster than dealing with Excel directly.

Below is a piece of code (that very much needs modifications to be used by you) that will get you in the right direction. I've added notes below the code snippet to explain the methodology, and point out what more you need to do to make this work for you.

Option Explicit

Sub Find_changes_modified()

    ' Reference the sheet with the data
    Dim Reader As Worksheet
    Set Reader = ThisWorkbook.Worksheets(2)

    ' Store the entire dataset as a range
    Dim RangeReader As Range
    Set RangeReader = Reader.Range("A1:J6") ' ***
    
    ' Sort the range from lowest to highest EventID, and lowest to highest Report Subtype
    ' Sorting the range allows us to compare EventIDs that are next to one another
    With Reader.Sort
        .SortFields.Clear
        .SortFields.Add2 Key:=Range("B2:B6"), Order:=xlAscending ' ***
        .SortFields.Add2 Key:=Range("J2:J6"), Order:=xlAscending ' ***
        .SetRange RangeReader
        .Header = xlYes
        .Apply
    End With

    ' Store the entire range as an array
    Dim ArrayReader() As Variant
    ArrayReader = RangeReader.Value
    
    ' Column numbers of columns in the array
    Dim ColumnNumberEventID As Long, ColumnNumberSubtype As Long
    ColumnNumberEventID = 2 ' ***
    ColumnNumberSubtype = 10 ' ***
    
    ' Store all duplicates in another array
    ' Make ArrayWriter the same size as ArrayReader
    Dim ArrayWriter() As Variant
    ReDim ArrayWriter(1 To UBound(ArrayReader, 1), 1 To UBound(ArrayReader, 2))
    
    Dim Duplicates As Long
    Duplicates = 0
    
    ' Iterate through the array and check for duplicates in the EventID column
    Dim ii As Long, jj As Long
    Dim ThisEventID As String, NextEventID As String, ThisSubType As String, NextSubType As String
    For ii = LBound(ArrayReader, 1) To UBound(ArrayReader, 1) - 1
    
        ThisEventID = ArrayReader(ii, ColumnNumberEventID)
        NextEventID = ArrayReader(ii   1, ColumnNumberEventID)
    
        If ThisEventID = NextEventID Then
        
            ThisSubType = ArrayReader(ii, ColumnNumberSubtype)
            NextSubType = ArrayReader(ii   1, ColumnNumberSubtype)
        
            If ThisSubType <> NextSubType Then
            
                Duplicates = Duplicates   1
            
                ' Copy all of the row's information to the ArrayWriter
                For jj = LBound(ArrayReader, 2) To UBound(ArrayReader, 2)
                
                    ArrayWriter(Duplicates, jj) = ArrayReader(ii, jj)
                
                Next jj
            
            End If
        
        End If
    
    Next ii
    
    ' Reference the sheet to write the duplicate data
    Dim Writer As Worksheet
    Set Writer = ThisWorkbook.Worksheets(3)
    
    ' Write the contents of the ArrayWriter to the other sheet
    Writer.Range("A1:J1").Value = ArrayWriter
    
End Sub

The big picture is to sort the data according to 2 columns (Event ID and Report Subtype), and then compare the data row-by-row to its neighbors. This sorting means that we only need to compare each row of data to its neighbor rather than check multiple rows against multiple rows every time.

All places in this code with a comment *** indicate something that needs to be changed for actual use. I've used many hardcoded values just to lay out how this works.

We start by creating a reference to the sheet with the data, Reader, just as you've done originally. Next, we store the data in a range, RangeReader. Using this range, we sort the data according to the Event ID column, and then according to the Report Subtype column.

With the range sorted, we store it as an array, ArrayReader, so we can work with it in VBA memory. We will also need an array to store any duplicate data rows we come across, ArrayWriter. Then, iterate through the contents of ArrayReader and compare each row to its neighbor. If a row and its neighbor meet the criteria, add it to the ArrayWriter. After all of this, write the ArrayWriter information to the sheet.

Some things for you to consider:

  • Can you find a programmatic way to determine the cells of RangeReader rather than just typing them manually into the code?
  • Can you find a programmatic way to determine each of the ranges used to sort RangeReader?
  • Will ColumnNumberEventID and ColumnNumberSubType always have the values of 2 and 10, respectively? If not, how can you make sure your code is always referencing the correct columns?
  • When this code finds a duplicate, it is only storing one of the items. You probably also want to store that other item that is the duplicate.
  • Related