How can I grab matching regular expressions from a string, remove the duplicates, and append them to a string variable that separates each by a comma?
For example, in the string, "this is an example of the desired regular expressions: BPOI-G8J7R9, BPOI-G8J7R9 and BPOI-E5Q8D2" the desired output string would be "BPOI-G8J7R9,BPOI-E5Q8D2"
I have attempted to use a dictionary to remove the duplicates, but my function is spitting out the dreaded #Value error.
Can anyone see where I'm going wrong here? Or is there any suggestion for a better way of going about this task?
Code below:
Public Function extractexpressions(ByVal text As String) As String
Dim regex, expressions, expressions_dict As Object, result As String, found_expressions As Variant, i As Long
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[A-Z][A-Z][A-Z][A-Z][-]\w\w\w\w\w\w"
regex.Global = True
Set expressions_dict = CreateObject("Scripting.Dictionary")
If regex.Test(text) Then
expressions = regex.Execute(text)
End If
For Each item In expressions
If Not expressions_dict.exists(item) Then expressions_dict.Add item, 1
Next
found_expressions = expressions_dict.items
result = ""
For i = 1 To expressions_dict.Count - 1
result = result & found_expressions(i) & ","
Next i
extractexpressions = result
End Function
CodePudding user response:
If you call your function from a Sub you will be able to debug it.
See the comment below about adding the matches as keys to the dictionary - if you add the match object itself, instead of explicitly specifying the match's value
property, your dictionary won't de-duplicate your matches (because two or more match
objects with the same value
are still distinct objects).
Sub Tester()
Debug.Print extractexpressions("ABCD-999999 and DFRG-123456 also ABCD-999999 blah")
End Sub
Public Function extractexpressions(ByVal text As String) As String
Dim regex As Object, expressions As Object, expressions_dict As Object
Dim item
Set regex = CreateObject("VBScript.RegExp")
regex.Pattern = "[A-Z]{4}-\w{6}"
regex.Global = True
If regex.Test(text) Then
Set expressions = regex.Execute(text)
Set expressions_dict = CreateObject("Scripting.Dictionary")
For Each item In expressions
'A dictionary can have object-type keys, so make sure to add the match *value*
' and the not match object itself
If Not expressions_dict.Exists(item.Value) Then expressions_dict.Add item.Value, 1
Next
extractexpressions = Join(expressions_dict.Keys, ",")
End If
End Function
CodePudding user response:
VBA's regex object actually supports the backreference to a previous capture group. Hence we can get all the unique items through the expression itself:
([A-Z]{4}-\w{6})(?!.*\1)