regex select multilines in powershell-CodePudding

I created a file like this

echo "test 1", Hello, foo, bar, world, "test 2" > test.txt

and the result is this:

test 1
Hello
foo
bar
a better world
test 2

I need to remove all the text starting with the keyword "Hello" and ending with "world", including both keywords.

Something like this

test 1
test 2

I tried

$pattern='(?s)(?<=/Hello/\r?\n).*?(?=world)'
(Get-Content -Path .\test.txt -Raw) -replace $pattern, "" | Set-Content -Path .\test.txt

but nothing happend. What can I try?

CodePudding user response：

Assuming you want to remove the starting and ending keywords you could use either (?s)\s*Hello.*world or (?s)\s*Hello.*?world depending on if you want .* to be greedy or lazy.

(Get-Content path\to\file.txt -Raw) -replace '(?s)\s*Hello.*world' |
    Set-Content path\to\result.txt

Use -creplace for case sensitive matching of the keywords.

CodePudding user response：

Leaving aside that there are extraneous / in your regex, reformulate it as follows:^{Tip of the hat to Santiago Squarzon.}

$pattern = '(?sm)^Hello\r?\n.*?world\r?\n'

(Get-Content -Path .\test.txt -Raw) -replace $pattern | 
  Set-Content -Path .\test.txt

This removes the line starting with Hello all the way through the (first) subsequent line that ends in world, including the next newline. This yields the desired output, as shown in your question.

As for what you tried:

Aside from the extraneous / chars., your primary problem is that you are using look-around assertions ((?<=...), (?=...)), which cause what they match not to be captured as part of the overall match, and are therefore not replaced by -replace.

CodePudding user response：

I think this is a duplicate with How can I deleted lines from a certain position? or any of the included other duplicates:

'test1', 'Hello', 'foo', 'bar', 'world', 'test2' |SelectString -From '(?=Hello)' -To '(?<=world)'