Home > Back-end >  Excel VBA Searching Through a Large Array
Excel VBA Searching Through a Large Array

Time:02-12

I have an array of ~5,000 unique IDs loaded from a CSV file:

Dim wb As Workbook
Dim idRng As Variant

Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\ID.csv")
    
With wb.Sheets(1)
    idRng = .Range("A2:A" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
End With
    
wb.Close

Alongside this, I also load in ~100,000 rows of data, which contains non-unique IDs with numerous possible duplicates. My aim is to loop through the 100,000 rows and check if the corresponding rows ID is contained within the smaller array, and if so, add the rows data to a collection. Both IDs are stored as Longs. I have completed this using the below:

Dim dataRng As Variant
Set wb = Workbooks.Open(Filename:=ThisWorkbook.path & "\DataSource\data.csv")
    
With wb.Sheets(1)
    dataRng = .Range("A2:H" & .Range("A" & .Rows.Count).End(xlUp).Row).Value2
        
    For i = LBound(dataRng) To UBound(dataRng)
        If mUtil.IsInArray(dataRng(i, 1), idRng) Then
            'Add object to collection
        End If
    Next
End With

'mUtil
Public Function IsInArray(v As Variant, arr As Variant) As Boolean
    For i = LBound(arr) To UBound(arr)
        If arr(i, 1) = v Then
            IsInArray = True
            Exit Function
        End If
    Next
    
    IsInArray = False
End Function

Despite this working, as you can imagine iterating through the 5,000 unique IDs 100,000 times can take a fair amount of time, alongside this, the larger file can end up being much bigger.

Is there a more efficient way of performing this task, with the ultimate aim to reduce the run time?

CodePudding user response:

I'd suggest throwing your 5,000 records into a dictionary and then use the Exists method to check to see if it does in fact exist.

Public Sub DictionaryTest()
    Dim lngKey As Long, objDict As Object
    
    Set objDict = CreateObject("Scripting.Dictionary")
    
    lngKey = 123456
    
    objDict.Add lngKey, 0
    
    Debug.Print objDict.Exists(lngKey)
End Sub

It absolves you from having to loop over the 5,000 each time AND the power of the search within the dictionary should speed up the process 10 fold.

CodePudding user response:

You can try something as simple as the following. Instead of looping twice, just loop one of them and Match if the item is found in the other array. I just tested with random numbers and just looped the unique values. This would work only if you want the first match. If you want all the matches you need to simply reverse it and loop the 100k non-unique array to the unique one. What we do is create MatchArr as a Variant and then use that variable for our Application.Match function. If the function finds a match, it returns the row it found it on. If it doesn't find a match it will error, but because we made it a variant it won't stop the code. We simply check if it's an error or not and if it is then we simply move to the next line.

This is what I tried (Change as needed):

EDIT: I've updated to do the loop of the bigger array that needs to be refined.

Sub FindValues()

Dim Arr1, Arr2, MatchArr, i As Long, Col As New Collection

Arr1 = Sheet1.Range("A1:A50").Value
Arr2 = Sheet1.Range("C1:C1000").Value

For i = LBound(Arr2, 1) To UBound(Arr2, 1)
    MatchArr = Application.Match(Arr2(i, 1), Arr1, 0)
    If Not IsError(MatchArr) Then
        Col.Add Arr2(i, 1)
    End If
Next i

For i = 1 To Col.Count
    Sheet1.Range("E" & i).Value = Col(i)
Next i

End Sub
  • Related