Home > Software engineering >  VBA Append unique regular expressions to string variable
VBA Append unique regular expressions to string variable

Time:05-12

How can I grab matching regular expressions from a string, remove the duplicates, and append them to a string variable that separates each by a comma?

For example, in the string, "this is an example of the desired regular expressions: BPOI-G8J7R9, BPOI-G8J7R9 and BPOI-E5Q8D2" the desired output string would be "BPOI-G8J7R9,BPOI-E5Q8D2"

I have attempted to use a dictionary to remove the duplicates, but my function is spitting out the dreaded #Value error.

Can anyone see where I'm going wrong here? Or is there any suggestion for a better way of going about this task?

Code below:

Public Function extractexpressions(ByVal text As String) As String
Dim regex, expressions, expressions_dict As Object, result As String, found_expressions As Variant, i As Long

Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[A-Z][A-Z][A-Z][A-Z][-]\w\w\w\w\w\w"
regex.Global = True

Set expressions_dict = CreateObject("Scripting.Dictionary")

If regex.Test(text) Then
    expressions = regex.Execute(text)
End If

For Each item In expressions
    If Not expressions_dict.exists(item) Then expressions_dict.Add item, 1
Next

found_expressions = expressions_dict.items

result = ""

For i = 1 To expressions_dict.Count - 1
    result = result & found_expressions(i) & ","
Next i

extractexpressions = result

End Function

CodePudding user response:

If you call your function from a Sub you will be able to debug it.

See the comment below about adding the matches as keys to the dictionary - if you add the match object itself, instead of explicitly specifying the match's value property, your dictionary won't de-duplicate your matches (because two or more match objects with the same value are still distinct objects).

Sub Tester()
    Debug.Print extractexpressions("ABCD-999999 and DFRG-123456 also ABCD-999999 blah")
End Sub


Public Function extractexpressions(ByVal text As String) As String
    Dim regex As Object, expressions As Object, expressions_dict As Object
    Dim item
    
    Set regex = CreateObject("VBScript.RegExp")
    regex.Pattern = "[A-Z]{4}-\w{6}"
    regex.Global = True
    
    If regex.Test(text) Then
        Set expressions = regex.Execute(text)
        Set expressions_dict = CreateObject("Scripting.Dictionary")
        For Each item In expressions
            'A dictionary can have object-type keys, so make sure to add the match *value*
            '  and the not match object itself
            If Not expressions_dict.Exists(item.Value) Then expressions_dict.Add item.Value, 1
        Next
        extractexpressions = Join(expressions_dict.Keys, ",")
    End If
End Function

CodePudding user response:

VBA's regex object actually supports the backreference to a previous capture group. Hence we can get all the unique items through the expression itself:

([A-Z]{4}-\w{6})(?!.*\1)

See an online enter image description here

  • Related