I'm faced with an interesting data validation challenge, every month my manager would perform a data verification activity on up 20k rows of data (this is entirely manual) where she looks for duplicate values in one column (EventID), if there is a duplicate compare the value of another column (subType) if the values are not the same. Both rows are copied to a different sheet for a secondary process.
I am still learning VBA, so I took this as a growing activity as it will give my manager a lot of time back in her day.
I have the code I am working on currently but I can see that I have started to go down the wrong path, is anyone able to assist?
Sub Find_changes ()
Dim eventID As Range
Dim subtype As Range
Dim cell As Range
Dim LastRow As Long
Dim Reader As Worksheet
Dim Writer As Worksheet
Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
Set eventID = Reader.Range("b:b")
Set subtype = Reader.Range("j:j")
Let LastRow = Writer.Cells(Rows.Count, 1).End(xlUp).Row 1
For Each cell In eventID
If eventID = eventID And subtype <> subtype Then
cell.EntireRow.Copy Destination:=Writer.Range(LastRow)
End If
Next
End Sub
I have also included a mock data set below:
Ideally what I would like to achieve (using the picture as an example) is to:
- Read Column B (EventID) to identify duplicates
- On duplicate (824466) compare values column J (Report SubType)
- If values are different (SubType 1 and SubType 2 is this example)
- Copy both rows to sperate sheet
CodePudding user response:
Please, try the next way which should be fast enough for larger ranges. It uses a Scripting Dictionary
to find duplicates and place a marker in a column outside the used range, based on what a range is created (using SpecialCells(xlCellTypeConstants)
) which will be copied at once, at the end:
Sub Find_changes()
Dim Reader As Worksheet, LastRow As Long, Writer As Worksheet, rngCopy As Range
Dim arrID, arrType, arrMark, i As Long, maxCol As Long, maxRow As Long, dict As Object
Set Reader = ThisWorkbook.Worksheets(2)
Set Writer = ThisWorkbook.Worksheets(3)
LastRow = Writer.Range("B" & Writer.rows.count).End(xlUp).Row 1
maxRow = Reader.Range("B" & Reader.rows.count).End(xlUp).Row
maxCol = Reader.UsedRange.Columns.count 20
arrID = Reader.Range("B1:B" & maxRow).value 'plase the range in an array for faster iteration
arrType = Reader.Range("J1:J" & maxRow).value 'plase the range in an array for faster iteration
ReDim arrMark(1 To maxRow, 1 To 1)
Set dict = CreateObject("Scripting.Dictionary")
'Place the duplicates in a dictionary and mark their rows:
For i = 1 To maxRow
If Not dict.Exists(arrID(i, 1)) Then
dict.Add arrID(i, 1), Array(arrType(i, 1), i)
Else
If dict(arrID(i, 1))(0) <> arrType(i, 1) Then
arrMark(i, 1) = "Copy" 'write "Copy" in arrMark
arrMark(dict(arrID(i, 1))(1), 1) = "Copy" 'write "Copy" in arrMark
End If
End If
Next
'drop the marker array after the last column:
Reader.cells(1, maxCol).Resize(UBound(arrMark), 1).value = arrMark
'Extract the range to be copied (at once)
On Error Resume Next
Set rngCopy = Reader.Range(Reader.cells(1, maxCol), Reader.cells(maxRow, maxCol)).SpecialCells(xlCellTypeConstants)
On Error GoTo 0
If Not rngCopy Is Nothing Then
rngCopy.ClearContents
rngCopy.EntireRow.Copy Writer.Range("A" & LastRow)
End If
MsgBox "Ready..."
End Sub
I tried commenting the code lines, but if something not clear enough, please do not hesitate to ask for clarification.
CodePudding user response:
For large numbers of cells, it's often beneficial to store ranges as arrays, and work with them in VBA memory rather than working with them straight in Excel. You'll find that working with data in VBA memory is much faster than dealing with Excel directly.
Below is a piece of code (that very much needs modifications to be used by you) that will get you in the right direction. I've added notes below the code snippet to explain the methodology, and point out what more you need to do to make this work for you.
Option Explicit
Sub Find_changes_modified()
' Reference the sheet with the data
Dim Reader As Worksheet
Set Reader = ThisWorkbook.Worksheets(2)
' Store the entire dataset as a range
Dim RangeReader As Range
Set RangeReader = Reader.Range("A1:J6") ' ***
' Sort the range from lowest to highest EventID, and lowest to highest Report Subtype
' Sorting the range allows us to compare EventIDs that are next to one another
With Reader.Sort
.SortFields.Clear
.SortFields.Add2 Key:=Range("B2:B6"), Order:=xlAscending ' ***
.SortFields.Add2 Key:=Range("J2:J6"), Order:=xlAscending ' ***
.SetRange RangeReader
.Header = xlYes
.Apply
End With
' Store the entire range as an array
Dim ArrayReader() As Variant
ArrayReader = RangeReader.Value
' Column numbers of columns in the array
Dim ColumnNumberEventID As Long, ColumnNumberSubtype As Long
ColumnNumberEventID = 2 ' ***
ColumnNumberSubtype = 10 ' ***
' Store all duplicates in another array
' Make ArrayWriter the same size as ArrayReader
Dim ArrayWriter() As Variant
ReDim ArrayWriter(1 To UBound(ArrayReader, 1), 1 To UBound(ArrayReader, 2))
Dim Duplicates As Long
Duplicates = 0
' Iterate through the array and check for duplicates in the EventID column
Dim ii As Long, jj As Long
Dim ThisEventID As String, NextEventID As String, ThisSubType As String, NextSubType As String
For ii = LBound(ArrayReader, 1) To UBound(ArrayReader, 1) - 1
ThisEventID = ArrayReader(ii, ColumnNumberEventID)
NextEventID = ArrayReader(ii 1, ColumnNumberEventID)
If ThisEventID = NextEventID Then
ThisSubType = ArrayReader(ii, ColumnNumberSubtype)
NextSubType = ArrayReader(ii 1, ColumnNumberSubtype)
If ThisSubType <> NextSubType Then
Duplicates = Duplicates 1
' Copy all of the row's information to the ArrayWriter
For jj = LBound(ArrayReader, 2) To UBound(ArrayReader, 2)
ArrayWriter(Duplicates, jj) = ArrayReader(ii, jj)
Next jj
End If
End If
Next ii
' Reference the sheet to write the duplicate data
Dim Writer As Worksheet
Set Writer = ThisWorkbook.Worksheets(3)
' Write the contents of the ArrayWriter to the other sheet
Writer.Range("A1:J1").Value = ArrayWriter
End Sub
The big picture is to sort the data according to 2 columns (Event ID and Report Subtype), and then compare the data row-by-row to its neighbors. This sorting means that we only need to compare each row of data to its neighbor rather than check multiple rows against multiple rows every time.
All places in this code with a comment ***
indicate something that needs to be changed for actual use. I've used many hardcoded values just to lay out how this works.
We start by creating a reference to the sheet with the data, Reader
, just as you've done originally. Next, we store the data in a range, RangeReader
. Using this range, we sort the data according to the Event ID column, and then according to the Report Subtype column.
With the range sorted, we store it as an array, ArrayReader
, so we can work with it in VBA memory. We will also need an array to store any duplicate data rows we come across, ArrayWriter
. Then, iterate through the contents of ArrayReader
and compare each row to its neighbor. If a row and its neighbor meet the criteria, add it to the ArrayWriter
. After all of this, write the ArrayWriter
information to the sheet.
Some things for you to consider:
- Can you find a programmatic way to determine the cells of
RangeReader
rather than just typing them manually into the code? - Can you find a programmatic way to determine each of the ranges used to sort
RangeReader
? - Will
ColumnNumberEventID
andColumnNumberSubType
always have the values of 2 and 10, respectively? If not, how can you make sure your code is always referencing the correct columns? - When this code finds a duplicate, it is only storing one of the items. You probably also want to store that other item that is the duplicate.