I have a name delimiter that I want to use to extract the whole line where it is found.
[string]$testString = $null
# broken text string of text & newlines which simulates $testString = Get-Content -Raw
$testString = "initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text"
# test1
# simulate text string before(?<content>.*)text string after - this returns "initial text" only (no newline or anything after)
# $testString -match "(?<BOURKE>.*)"
# test2
# this returns all text, including the newlines, so that $testString outputs exactly as it is defined
$testString -match "(?s)(?<BOURKE>.*)"
#test3
# I want just the line with BOURKE
$result = $matches['BOURKE']
$result
#Test1 finds the match but only prints to the newline. #Test2 finds the match and includes all newlines. I would like to know what is the regex pattern that forces the output to begin 001 BOURKE ...
Any suggestions would be appreciated.
CodePudding user response:
I find it best to have a match consume up to what is not needed; the \r\n
. That can be done with the set nomenclature with the ^
in the set such as [^\r\n]
which says consume up to either a \r
or a \n
. Hence everything that is not a \r\n
.
To do that use
$testString -match "(?<Bourke>\d\d\d\s[^\r\n] )"
Also one should try to avoid the *
when one knows there will be matchable txt...the *
is a greedy type that consumes everything. Usage of the
, one or more, limits the match considerably because the parser doesn't have to try patterns (The zero of the *
s zero or more), backtracking as its called which are patently not plausable.
CodePudding user response:
While a pure regex solution is possible (see bottom section), in this case I suggest delegating to the Select-String
cmdlet, whose very purpose is to find the whole lines on which a given regex or literal substring (-SimpleMatch
) matches:
(Select-String -LiteralPath file.txt -Pattern BOURKE).Line
Add -CaseSensitive
for case-sensitive matching.
The following example simulates the above (-split '\r?\n'
splits the multiline input string into individual lines):
(
@'
initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text
'@ -split '\r?\n' |
Select-String -Pattern BOURKE
).Line
Output:
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
If you do need a pure regex solution that operates directly on a multi-line input string:
if (
@'
initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text
'@ -match '.*BOURKE.*') {
$Matches[0]
}
To match case-sensitively, use -cmatch
instead of -match
.
For an explanation of the regex and the ability to experiment with it, see this regex101.com page.
Note: If your input string uses Windows CRLF newlines (\r\n
) instead of Unix LF newlines (\n
), use the following regex instead, to avoid capturing the CR (\r
) at the end of the line:
'.*BOURKE[^\r\n]*'