Home > Software design >  How to do global replace with -replace in Powershell?
How to do global replace with -replace in Powershell?

Time:01-15

I am new to Powershell, so please understand.

I have this pattern

(.*?)(\d{3})(.*?:\r?\n)(?!\2)(\d{3})

to match this text:

111 is different from:
111 is different from:
123 is different from:
567.

This only gives 1 match, whereas there are 2 instances there. How can that be achieved? The pattern consumes 123 in the first instance so that it can't be found. I had to repeat the line several times to overcome this. I believe there are other ways. Please help.

Tried to change the 123 pattern into lookahead. But I couldn't capture the 123.

Goal: I want to insert a line, a sentence, between the two different values.

EDIT: like this

111 is different from:
111 is different from:
   *** This value 123 ***
123 is different from:
  *** This value 567 ***
567. 

CodePudding user response:

You can use 2 capture groups where you can use the first group in a negative lookahead, and the second group to get the right result after replacing.

^(\d{3})\b.*:(?=\r?\n(?!\1)(\d{3})\b)

In the replacement use the full match and group 2:

$0\n   *** This value $2 ***

See a .NET regex101 demo.

Output

111 is different from:
111 is different from:
   *** This value 123 ***
123 is different from:
   *** This value 567 ***
567.

If you want the position at the start of the string that asserts that the next line does not start with the digits at the start of the first line, the whole pattern will be in a positive lookahead assertion:

^(?=(\d{3}\b)(.*:\r?\n)(?!\1)(\d{3})\b)

See another .NET regex101 demo.

CodePudding user response:

Note that PowerShell's -replace operator is invariably global, i.e. it always looks for and replaces all matches of the given regex.

Use the following -replace operation instead:

@'
111 is different from:
111 is different from:
123 is different from:
567.
'@ -replace '(?m)^(\d{3}) . :(\r?\n)(?!\1)(?=(\d{3})\b)', 
            '$0  *** This value $3 ***$2'

Note: The @'<newline>...<newline>'@ string literal used for the multiline input text is a so-called here-string.

Output:

111 is different from:
111 is different from:
  *** This value 123 ***
123 is different from:
  *** This value 567 ***
567.
  • For a detailed explanation of the regex and the ability to experiment with it, see this regex101.com page, but in short:

    • (?m) is the inline form of the Multiline .NET regex option, which makes ^ and $ match at the start and end of each line.

    • ^(\d{3}) therefore matches a 3-digit sequence only at the start of a line, in a capture group, and . : matches a space and at least one additional character on the same line all the way to a : at the end.

    • (\r?\n) captures the specific newline sequence encountered (which may be CRLF (Windows-format) or just LF (Unix-format)) in a 2nd capture group.

      • Capturing the specific newline sequence allows you to replicate it in the substitution string via placeholder $2, to ensure that the newly inserted line is terminated with the same sequence.

      • If you don't care about potentially mixing \r\n and \n in the resulting string, you could omit the 2nd capture group and use "`n" (sic) or "`r`n" instead, using an expandable string ("...") with an escape sequence - note that, unlike in C#, \r and \n are not recognized in PowerShell string literals (it is only the .NET regex engine that recognizes them, but not in the substitution operand of -replace, which is not a regex, and where only $-prefixed placeholders are recognized).

        # Conceptually cleaner: separate the verbatim part from
        # the expandable part.
        ('$0  *** This value $2 ***'   "`n")
        
        # Alternative, using a single "..." string
        # The '$' chars. that are part of -replace *placeholders*
        # must be *escaped as `$* to prevent up-front expansion by PowerShell
        "`$0  *** This value `$2 ***`n"
        
    • (?!\1)(?=(\d{3})\b) uses both a negative ((?!...)) and positive (?=...) lookahead assertion to look for 3 digits at the start of the next line (at a word boundary, due to \b) that aren't the same as the 3 digits on the current line (\1 being a backreference to what the 1st capture group matched).

      • Note that using a capture group inside an overall by-definition non-capturing lookaround assertion is possible, and indeed used above to capture the 3-digit sequence at the start of the subsequent line, referenced via placeholder $3 in the substitution string.
    • In the substitution string, $0, $2 and $3 refer to the what the entire regex, the 2nd capture group, and the 3rd one captured, respectively ($& may be used in lieu of $0; see this answer for more info about these placeholders).

      • Note that by using a string as the substitution operand, you are limited to embedding captured text as-is, via placeholders as such as $0 (see this answer for more info about these placeholders). If you need to determine the substitution text fully dynamically, i.e. if it needs to apply transformations based on each match:

        • In PowerShell (Core) 7 , you can use a script block { ... } instead.

        • In Windows PowerShell, you'll have to call the underlying [regex]::Replace() method directly.

      • See below.


To spell out the fully dynamic substitution approach, adding 1 to the captured number in this example:

PowerShell (Core) 7 solution, using a script block ({ ... }) as -replace's substitution operand:

@'
111 is different from:
111 is different from:
123 is different from:
567.
'@ -replace '(?m)^(\d{3}) . :(\r?\n)(?!\1)(?=(\d{3})\b)', {
               '{0}  *** This value   1: {1} ***{2}' -f $_.Value, ([int] $_.Groups[3].Value   1), $_.Groups[2].Value
            }

Windows PowerShell solution, where a direct call to the underlying [regex]::Replace() method is required:

$str = @'
111 is different from:
111 is different from:
123 is different from:
567.
'@

[regex]::Replace(
  $str, 
  '(?m)^(\d{3}) . :(\r?\n)(?!\1)(?=(\d{3})\b)', 
  {
    param($m)
    '{0}  *** This value   1: {1} ***{2}' -f $m.Value, ([int] $m.Groups[3].Value   1), $m.Groups[2].Value
  }
)

Output (note that 1 has been added to each captured value):

111 is different from:
111 is different from:
  *** This value   1: 124 ***
123 is different from:
  *** This value   1: 568 ***
567.
  • Related