I have an array of ~5,000 unique IDs loaded from a CSV file:
Dim wb As Workbook
Dim idRng As Variant
Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\ID.csv")
With wb.Sheets(1)
idRng = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
End With
wb.Close
Alongside this, I also load in ~100,000 rows of data, which contains non-unique IDs with numerous possible duplicates. My aim is to loop through the 100,000 rows and check if the corresponding rows ID is contained within the smaller array, and if so, add the rows data to a collection. Both IDs are stored as Longs. I have completed this using the below:
Dim dataRng As Variant
Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\data.csv")
With wb.Sheets(1)
dataRng = .Range("A2:H" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
For i = LBound(dataRng) To UBound(dataRng)
If mUtil.IsInArray(dataRng(i, 1), idRng) Then
'Add object to collection
End If
Next
End With
'mUtil
Public Function IsInArray(v As Variant, arr As Variant) As Boolean
For i = LBound(arr) To UBound(arr)
If arr(i, 1) = v Then
IsInArray = True
Exit Function
End If
Next
IsInArray = False
End Function
Despite this working, as you can imagine iterating through the 5,000 unique IDs 100,000 times can take a fair amount of time, alongside this, the larger file can end up being much bigger.
Is there a more efficient way of performing this task, with the ultimate aim to reduce the run time?
CodePudding user response:
I'd suggest throwing your 5,000 records into a dictionary and then use the Exists
method to check to see if it does in fact exist.
Public Sub DictionaryTest()
Dim lngKey As Long, objDict As Object
Set objDict = CreateObject("Scripting.Dictionary")
lngKey = 123456
objDict.Add lngKey, 0
Debug.Print objDict.Exists(lngKey)
End Sub
It absolves you from having to loop over the 5,000 each time AND the power of the search within the dictionary should speed up the process 10 fold.
CodePudding user response:
You can try something as simple as the following. Instead of looping twice, just loop one of them and Match
if the item is found in the other array. I just tested with random numbers and just looped the unique values. This would work only if you want the first match. If you want all the matches you need to simply reverse it and loop the 100k non-unique array to the unique one.
What we do is create MatchArr
as a Variant and then use that variable for our Application.Match
function. If the function finds a match, it returns the row it found it on. If it doesn't find a match it will error, but because we made it a variant it won't stop the code. We simply check if it's an error or not and if it is then we simply move to the next line.
This is what I tried (Change as needed):
EDIT: I've updated to do the loop of the bigger array that needs to be refined.
Sub FindValues()
Dim Arr1, Arr2, MatchArr, i As Long, Col As New Collection
Arr1 = Sheet1.Range("A1:A50").Value
Arr2 = Sheet1.Range("C1:C1000").Value
For i = LBound(Arr2, 1) To UBound(Arr2, 1)
MatchArr = Application.Match(Arr2(i, 1), Arr1, 0)
If Not IsError(MatchArr) Then
Col.Add Arr2(i, 1)
End If
Next i
For i = 1 To Col.Count
Sheet1.Range("E" & i).Value = Col(i)
Next i
End Sub