Home > Mobile >  I need an alternative for regex lookbehind (negative and positive) in VBA
I need an alternative for regex lookbehind (negative and positive) in VBA

Time:12-07

I need to match all the : in a txt file, but avoiding when they are preceded by an https, http or \, but VBA does not support lookbehind for regex.

With negative-lookbehind it should be (?<!http)(?<!https)(?<!\\)\:.

For some engines that don't support lookbehind it can be ([^https*][^\\])\K\:.

Both do not work in VBA, the first regex gives me an error (5017), and the second one ignores all : but the code does not throw any errors.

Based on regEx positive lookbehind in VBA language I tested this in a small example: myString = "BA", pattern = "[^B](A)" and then myString = rg.Replace(myString,"$1"), the expected result was "A" but the result obtained was "$1BA". What did I miss?

CodePudding user response:

The "trick" is to match what you don't want, but then capture what you do want and return only the captured group. eg:

Sub regex()
    Dim RE As Object, MC As Object, M As Object
    Const sPat As String = "B(A)"
    Const myString As String = "BA"
    
Set RE = CreateObject("vbscript.regexp")
With RE
    .Pattern = sPat
    Set MC = .Execute(myString)
    Debug.Print MC(0).submatches(0)
End With
    
    
End Sub

will => A in the Immediate Window

CodePudding user response:

You can use

Dim pattern As regExp, m As Object
Dim text As String, result As String, repl As String, offset As Long

text = "http://www1 https://www2 \: : text:..."
repl = "_"
offset = 0

Set pattern = New regExp
With pattern
    .pattern = "(https?:|\\:)|:"
    .Global = True
End With

result = text
For Each m In pattern.Execute(text)
    If Len(m.SubMatches(0)) = 0 Then ' If Group 1 matched, replace with "a"
        result = Left(result, m.FirstIndex   offset) & repl & Mid(result, m.FirstIndex   m.Length   1   offset)
        offset = offset   Len(repl) - m.Length
    End If
Next

Output for http://www1 https://www2 \: : text:... is http://www1 https://www2 \: _ text_....

The point is to match and capture https:, http: or \: with the (https?:|\\:) capturing group, then replace inline while matching. The use of the offset helps to track the changing string length, especially when you need to replace with a string of different length.

  • Related