Home > database >  Regex parse text with multiple recurring optional blocks
Regex parse text with multiple recurring optional blocks

Time:08-09

I have a configuration output that I want to parse using regex. I want to capture every proxy-set definition including its proxy-ip config which is optional.

I have succeeded to have 2 matched groups with 1 proxy-ip but the second proxy-ip isn't captured. Also it doesn't capture a proxy-set without proxy-ip set.

I tried the following regex in regex101: ((proxy-set.*?((proxy-ip.*?)exit).*?)) /gms

       proxy-set 2
    proxy-name "Teams"
    proxy-enable-keep-alive using-options
    proxy-load-balancing-method random-weights
    is-proxy-hot-swap enable
    tls-context-name "Teams"
    sbcipv4-sip-int-name "Teams"
    activate
    proxy-ip 0
     proxy-address "sip.pstnhub.microsoft.com:5061"
     transport-type tls
     priority 1
     weight 1
     activate
    exit
    proxy-ip 1
     proxy-address "sip.pstnhub.microsoft.com:5061"
     transport-type tls
     priority 1
     weight 1
     activate
    exit
   exit
   proxy-set 2
    proxy-name "Teams"
    proxy-enable-keep-alive using-options
    proxy-load-balancing-method random-weights
    is-proxy-hot-swap enable
    tls-context-name "Teams"
    sbcipv4-sip-int-name "Teams"
    activate
    proxy-ip 0
     proxy-address "sip.pstnhub.microsoft.com:5061"
     transport-type tls
     priority 1
     weight 1
     activate
    exit
    proxy-ip 1
     proxy-address "sip.pstnhub.microsoft.com:5061"
     transport-type tls
     priority 1
     weight 1
     activate
    exit
   exit
   proxy-set 3
    proxy-name "Teams"
    proxy-enable-keep-alive using-options
    proxy-load-balancing-method random-weights
    is-proxy-hot-swap enable
    tls-context-name "Teams"
    sbcipv4-sip-int-name "Teams"
    activate
   exit

CodePudding user response:

If you are using Powershell, you can make use of 2 capture groups, where you can get the repetitive values of the capture group 2 values from the Captures Property.

(?m)^[\p{Zs}\t]*(proxy-set\b.*(?:\r?\n(?![\p{Zs}\t]*proxy-(?:set|ip)\b).*)*)(?:\r?\n(proxy-ip\b.*(?:\r?\n(?![\p{Zs}\t]*proxy-(?:set|ip)\b).*)*\r?\nexit))*

Explanation

  • (?m) Inline modifier to enable multiline
  • ^ Start of string
  • [\p{Zs}\t]* Match optional leading spaces
  • ( Capture group 1
    • proxy-set\b Match literally and append a word boundary
    • .* Match the rest of the line
    • (?: Non capture group to repeat as a whole part
      • \r?\n Match a newline
      • (?![\p{Zs}\t]*proxy-(?:set|ip)\b).* Match the whole line if it does not start with proxy-set or proxy-ip
    • )* Close the non capture group and optionally repeat
  • ) Close group 1
  • (?: Non capture group
    • \r?\n Match a newline
    • ( Capture group 2
      • proxy-ip\b.* Match proxy-ip
        • (?: Non capture group to repeat as a whole part
          • \r?\n Match a newline
          • (?![\p{Zs}\t]*proxy-(?:set|ip)\b).* Match the whole line if it does not start with proxy-set or proxy-ip
        • )* Close the non capture group and repeat as a whole part
        • \r?\n Match a newline
      • exit Match literally
    • ) Close group 2
  • )* Optionally repeat the outer non capture group which will have capture group 2 in the repetition and can be retrieved using the Captures Property

See a .NET regex demo

Powershell example to see the different values:

$txt = Get-Content -Raw file.txt

$pattern = "(?m)^[\p{Zs}\t]*(proxy-set\b.*(?:\r?\n(?![\p{Zs}\t]*proxy-(?:set|ip)\b).*)*)(?:\r?\n(proxy-ip\b.*(?:\r?\n(?![\p{Zs}\t]*proxy-(?:set|ip)\b).*)*\r?\nexit))*"
Select-String $pattern -input $txt  -AllMatches |
        ForEach-Object { $_.Matches } |
        ForEach-Object {
            Write-Host "`n------ group 1 value ------`n"
            Write-Host $_.Groups[1].Value
            
            foreach($capture in $_.Groups[2].Captures) {
                Write-Host "`n------ group 2 capture value ------`n"
                Write-Host $capture.Value
            }        
        }

Or an example of the pattern where the lines can have optional leading spaces.

  • Related