Home > Net >  How do I Output Substring to Newline from a Raw Text String using Regex
How do I Output Substring to Newline from a Raw Text String using Regex

Time:03-09

I have a name delimiter that I want to use to extract the whole line where it is found.

[string]$testString = $null

# broken text string of text & newlines which simulates $testString = Get-Content -Raw

$testString = "initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text"

# test1
# simulate text string before(?<content>.*)text string after - this returns "initial text" only (no newline or anything after)
# $testString -match "(?<BOURKE>.*)"

# test2
# this returns all text, including the newlines, so that $testString outputs exactly as it is defined 
$testString -match "(?s)(?<BOURKE>.*)"

#test3
# I want just the line with BOURKE

$result = $matches['BOURKE']

$result

#Test1 finds the match but only prints to the newline. #Test2 finds the match and includes all newlines. I would like to know what is the regex pattern that forces the output to begin 001 BOURKE ...

Any suggestions would be appreciated.

CodePudding user response:

While a pure regex solution is possible, in this case I suggest delegating to the Select-String cmdlet, whose very purpose is to find the whole lines on which a given regex or literal substring (-SimpleMatch) matches:

(Select-String -LiteralPath file.txt -Pattern BOURKE).Line

Add -CaseSensitive for case-sensitive matching.

The following example simulates the above (-split '\r?\n' splits the multiline input string into individual lines):

(
  @'
initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text
'@ -split '\r?\n' |
    Select-String -Pattern BOURKE
).Line

Output:

001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...

If you do need a pure regex solution that operates directly on a multi-line input string:

if (
  @'
initial text
preliminary text
unfinished line bfore the line I want
001 BOURKE, Bridget Mary ....... ........... 13 Mahina Road, Mahina Bay.Producrs/As 002 BOURKE. David Gerard ...
line after the line I want
extra text
extra extra text
'@ -match '(?m)^.*BOURKE.*') {
  $Matches[0]
}

To match case-sensitively, use -cmatch instead of match.

For an explanation of the regex and the ability to experiment with it, see this regex101.com page.

CodePudding user response:

I find it best to have a match consume up to what is not needed; the \r\n. That can be done with the set nomenclature with the ^ in the set such as [^\r\n] which says consume up to either a \r or a \n. Hence everything that is not a \r\n.

To do that use

$testString -match "(?<Bourke>\d\d\d\s[^\r\n] )"

Also one should try to avoid the * when one knows there will be matchable txt...the * is a greedy type that consumes everything. Usage of the , one or more, limits the match considerably because the parser doesn't have to try patterns (The zero of the *s zero or more), backtracking as its called which are patently not plausable.

  • Related