Home > Net >  RegEx Negative Lookahead Replacement Excluding Starting and Ending Pattern
RegEx Negative Lookahead Replacement Excluding Starting and Ending Pattern

Time:10-12

I am using RegEx in VBA and trying to replace any character that is not a vertical pipe within a beginning and ending vertical pipe.

In other words, my text looks like this:

| This|Is|An|Example |

I only want to replace the non-pipe characters that appear after the first pipe and before the last pipe.

So, using an "a" as the replacement character for example:

|aaaaaaa|aa|aa|aaaaaaaaa|

I've tried a negative lookahead (the VBA implementation of RegEx doesn't support lookbehind), and while it works for excluding the characters after the last pipe from replacement, those before the first pipe still get replaced:

(?!^ *\|)[^\|](?! *$)

aaa|aaaaaaa|aa|aa|aaaaaaaaa|

I'm obviously not understanding the negative lookahead correctly because I can't seem to get those characters that appear prior to the first pipe to be excluded from the match.

Can someone please help me with this?

CodePudding user response:

If data follows the pattern (always at least two vertical pipes) here an idea with optional first part:

(^[^|]*\|)?[^|](?=[^|]*\|)

See this demo at regex101 (used \n in demo for not skipping lines) - replace with $1a

The part until the first pipe is captured to the first group $1 and inserted in replacement.
The lookahead checks after each [^|] (negated class) if there is another | pipe ahead.

CodePudding user response:

This is kind of a workaround, but instead of a complicated Regex pattern, which probably won't work in VBA, you can use a quick loop though the string and replace each section using a much simpler Regex pattern.

Sub Example()
    Const InputString As String = "    |   This|Is|An|Example  |    "
    Debug.Print DoTheThing(InputString)
    'Output:     |aaaaaaa|aa|aa|aaaaaaaaa|    
End Sub

Function DoTheThing(InputString As String) As String
    Dim Pieces() As String
    Pieces = Split(InputString, "|")
    
    Dim Regex As Object
    Set Regex = CreateObject("VBScript.RegExp")
    Regex.Global = True
    Regex.Pattern = "."
    
    If UBound(Pieces) > 1 Then
        Dim i As Long
        For i = 1 To UBound(Pieces) - 1
            Pieces(i) = Regex.Replace(Pieces(i), "a")
        Next
    End If
    
    DoTheThing = Join(Pieces, "|")
End Function

Or you can even avoid the use of the Regex Object entirely:

Sub Example()
    Const InputString As String = "    |   This|Is|An|Example  |    "
    Debug.Print DoTheThing(InputString)
    'Output:     |aaaaaaa|aa|aa|aaaaaaaaa|    
End Sub

Function DoTheThing(InputString As String) As String
    Dim Pieces() As String
    Pieces = Split(InputString, "|")
    
    If UBound(Pieces) > 1 Then
        Dim i As Long
        For i = 1 To UBound(Pieces) - 1
            Pieces(i) = String(Len(Pieces(i)), "a")
        Next
    End If
    
    DoTheThing = Join(Pieces, "|")
End Function

CodePudding user response:

The (?!^ *\|)[^\|](?! *$) regex matches

  • (?!^ *\|) - a location that is not immediately followed with zero or more spaces and then a | char
  • [^\|] - any char other than a | char that
  • (?! *$) - immediately to the right, there should be no optional spaces and then the end of string.

I suggest using another approach: match and capture what you need to keep and just match what you need to replace.

See the VBA code:

Dim pattern As regExp, m As Object
Dim text As String, result As String, repl As String, offset As Long

text = "    |   This|Is|An|Example  |    "
repl = "a"
offset = 0

Set pattern = New regExp
With pattern
    .pattern = "(^\  \||\|\  $)|[^|]"
    .Global = True
End With

result = text
For Each m In pattern.Execute(text)
    If Len(m.SubMatches(0)) = 0 Then ' If Group 1 matched, replace with "a"
        result = Left(result, m.FirstIndex   offset) & repl & Mid(result, m.FirstIndex   m.Length   1   offset)
        offset = offset   Len(repl) - m.Length
    End If
Next

Output: |aaaaaaa|aa|aa|aaaaaaaaa| .

The (^\ \||\|\ $)|[^|] regex matches

  • (^\ \||\|\ $) - Group 1: one or more chars and then a | at the start of string, or a | and then one or more chars till the end of string
  • | - or
  • [^|] - any char other than a | char.
  • Related