I need guidance on the following. I have a file with 150000 records (excel). Another excel file with 5000-6000 records is received and need to delete the row based on some criteria of the info from the second file.
I use Dictionary function to collect the second file data in dictionary -
IntI = 2
Do While wbk.Sheets("Sheet1").Cells(IntI, 1).Value <> ""
strAgNo = wbk.Sheets("Sheet1").Cells(IntI, 8).Value
If Dict.Exists(strAgNo) Then
Else
Dict.Add Key:=strAgNo, Item:=IntI
End If
IntI = IntI 1
Loop
wbk.Close SaveChanges:=False
Then based on criteria of second file record, sering the first file using Range Find command (rgFound is Object)-
For n = 0 To Dict.Count - 1
strAgNo = Dict.Keys(n)
Set rgFound = Range("G:G").Find(strAgNo)
If rgFound Is Nothing Then
intNotSetlAg = intNotSetlAg 1
Else
FoundRow = rgFound.Row
intSetlAg = intSetlAg 1
Rows(FoundRow).Select
wbk.Sheets("Details").Rows(FoundRow).Delete
End If
Next n
This is working fine. However for 160000 to 180000 records in first file and 5 to 6K rows (to be deleted in first file) it takes 40-45 minutes. Need guidance for this in excel vba.
CodePudding user response:
Following from my comment above. This ran in ~20sec for me (150k rows of data, 5k random values to be deleted)
EDIT: refactored a bit...
Sub DeleteMatches()
Dim dict As Object, arr, n As Long, t
Dim col As New Collection
'create some sample data
With Sheet1.[A2:A150000]
.Formula = "=""Val_"" & TEXT(ROW()-1,""00000000"")"
.Value = .Value
End With
t = Timer
'load the ids to be deleted
'tested with 5k rows of `="Val_" & TEXT(RANDBETWEEN(1,150000),"00000000")`
Set dict = UniquesFromColumn(Sheet2.Range("A2"))
Debug.Print "Loaded Ids: " & Timer - t
'load the sheet1 id column into an array and scan through it,
' collecting any matched rows in the Collection
arr = Sheet1.Range("A1", Sheet1.Cells(Rows.Count, 1).End(xlUp)).Value
For n = 2 To UBound(arr, 1) 'skip header row if present
If dict.exists(arr(n, 1)) Then col.Add Sheet1.Cells(n, 1)
Next n
Debug.Print "Scanned sheet1 for matches: " & Timer - t
DeleteRows col 'delete the collected rows
Debug.Print "Deleted " & col.Count & " rows: " & Timer - t
End Sub
'return a dictionary of unique values from a column, starting at `startCell`
Function UniquesFromColumn(startCell As Range) As Object
Dim dict As Object, arr, n As Long, v
Set dict = CreateObject("scripting.dictionary")
With startCell.Parent
arr = .Range(startCell, .Cells(.Rows.Count, 1).End(xlUp)).Value
End With
For n = 1 To UBound(arr)
v = arr(n, 1)
If Len(v) > 0 Then dict(v) = dict(v) 1
Next n
Set UniquesFromColumn = dict
End Function
'delete all rows based on a collection of cells
Sub DeleteRows(col As Collection)
Dim rng As Range, n As Long, i As Long
If col.Count = 0 Then Exit Sub
'loop over the cells in the collection, building ranges for deletion
For n = col.Count To 1 Step -1
If rng Is Nothing Then
Set rng = col(n)
i = 1
Else
Set rng = Application.Union(rng, col(n))
i = i 1
If i > 200 Then 'union gets slow after a point, so delete and reset
rng.EntireRow.Delete
Set rng = Nothing
End If
End If
Next n
If Not rng Is Nothing Then rng.EntireRow.Delete 'any last rows?
End Sub
CodePudding user response:
Let Excel do all the dirty work using native features.
Logic:
- Use Remove duplicates to get unique entries from
Sheet1
- Store the above data in an array
- Store Col G from 2nd File in an array
- Search 1st array in 2nd array and if found replace value by say "DELME"
- Write the 2nd array back to the worksheet
- Autofilter column G on "DELME" and delete all rows in 1 go.
Test Conditions I used
The total number of unique IDs in File A
and the number of these IDs
in File B
will always influence the time taken by the code.
The other thing that will influence the time taken by the code is your hardware specs.
I test the below code on
- ★ CPU ★ Ryzen 5 5800X
- ★ GPU ★ NVIDIA GeForce RTX 3080 TI Founder Edition
- ★ RAM ★ G. Skill Trident Z RGB 32GB 3600MHz
See what time does this code give against your data?
File A
: 6000
records out of which there are 2500
unique values.
File B
: 150000
records which has 20830
duplicates values to be removed.
To do a stess test, I used these 2 files.
Code:
Here is the code that I tested
Option Explicit
'~~> This is the 2nd file. Change as applicable
Private Const fileA As String = "C:\Users\routs\Desktop\Delete Me Later\FileA.xlsx"
'~~> This is the 1st file. Change as applicable
Private Const fileB As String = "C:\Users\routs\Desktop\Delete Me Later\FileB.xlsx"
Sub Sample()
Debug.Print Now
Dim wbA As Workbook
Dim wsA As Worksheet
Set wbA = Workbooks.Open(fileA)
'~~> This is the relevant sheet
Set wsA = wbA.Sheets("Sheet1")
Dim lRow As Long
Dim lCol As Long
Dim arA As Variant
With wsA
'~~> Remove any filters
.AutoFilterMode = False
'~~> Find last row and last column
lCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
lRow = .Range("H" & .Rows.Count).End(xlUp).Row
'~~> Use Excel remove duplicates to delete duplicates
.Range("A1:" & Split(.Cells(, lCol).Address, "$")(1) & lRow).RemoveDuplicates Columns:=8, Header:=xlYes
'~~> Find the next last row
lRow = .Range("H" & .Rows.Count).End(xlUp).Row
'~~> Store the data in an array
arA = .Range("H2:H" & lRow).Value2
End With
Debug.Print "ID Array has " & lRow & " items"
wbA.Close (False)
Dim wbB As Workbook
Dim wsB As Worksheet
Set wbB = Workbooks.Open(fileB)
'~~> This is the relevant sheet
Set wsB = wbB.Sheets("Sheet1")
Dim arB As Variant
Dim lastCol As String
Dim oldRow As Long, newRow As Long
With wsB
'~~> Remove any filters
.AutoFilterMode = False
'~~> Find last row and last column
lCol = .Cells(1, .Columns.Count).End(xlToLeft).Column
lastCol = Split(.Cells(, lCol).Address, "$")(1)
lRow = .Range("G" & .Rows.Count).End(xlUp).Row
oldRow = lRow
Debug.Print "Main Array has " & lRow & " items"
'~~> Store the data in an array
arB = .Range("G2:G" & lRow).Value2
End With
Dim i As Long
Dim j As Long
Dim k As Long
For i = LBound(arA) To UBound(arA)
For j = LBound(arB) To UBound(arB)
If arB(j, 1) = arA(i, 1) Then arB(j, 1) = "DELME"
Next j
Next i
Dim Rng As Range
With wsB
.Range("G2").Resize(UBound(arB), 1).value = arB
Set Rng = .Range("A1:" & lastCol & lRow)
With Rng
'~~> Filter, offset(to exclude headers) and delete visible rows
With Rng
.AutoFilter Field:=7, Criteria1:="DELME"
.Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
End With
End With
'~~> Remove any filters
.AutoFilterMode = False
lRow = .Range("G" & .Rows.Count).End(xlUp).Row
newRow = lRow
Debug.Print "Total " & (oldRow - newRow) & " items were removed."
End With
Debug.Print Now
End Sub
Output
The code took 58 seconds on this particular test data.
08-12-2021 13:16:51
ID Array has 2500 items
Main Array has 150000 items
Total 20830 items were removed.
08-12-2021 13:17:49