Home > Blockchain >  Not able to capture with capture groups
Not able to capture with capture groups

Time:10-18

Within each text file, being worked with, there are two pages of listings. They are referenced identically using the same reference numbers (001,002, ... etc) The aim is to separate these two listings and store separately in an array. I have simplified the problem for testing.

# short listings (an unhelpful text file listing structure. But ain't that life? )
# "001 First in listing one 002 Second in listing one 003 Third in listing one 001 First in listing two 002 Second in listing two 003 Third in listing two

# read shortListings from file 
$listings = Get-Content -Path C:\test\test6\shortListings.txt
[string[]]$result = $null

# regular expression where I am trying to separate listing 'ones' & listing 'twos' into string array $result 

# $result[0] = "001 First in listing one 002 Second in listing one 003 Third in listing one"
# $result[1] = "001 First in listing two 002 Second in listing two 003 Third in listing two"

$regex = '(?s)(001.*?)((?=001.*?))'

$result = $listings | Select-String -Pattern $regex -AllMatches | ForEach-Object { $_.Matches.Value}

# Okay. That's listing one. But how do I get listing two?
$result[0]

This is close enter image description here

CodePudding user response:

Here is one way this could be done, using a combination of -split and Group-Object:

$result = (Get-Content shortListings.txt -Raw) -split '\s*(?=\d{3})' -ne '' |
    Group-Object { [regex]::Match($_, '\w $').Value } -AsHashTable -AsString

However using this method, the Values of the Hashtable would be an array of strings instead of a single string. You can however -join them later:

PS ..\pwsh> $result

Name                 Value
----                 -----
one                  {001 First in listing one, 002 Second...
two                  {001 First in listing two, 002 Second...

PS ..\pwsh> $result['one']

001 First in listing one
002 Second in listing one
003 Third in listing one

PS ..\pwsh> $result['one'] -join ' '

001 First in listing one 002 Second in listing one 003 Third in listing one

CodePudding user response:

# read shortListings from file 
$listings = Get-Content -Path C:\test\test6\shortListings.txt

# regex produces 3 groups 0=complete listing 1=listings in one 2=listings in two
$regex = '(?s)(001.*?)(?=001.*?)(001.*?$)'

# access group matches using array $line
[string[]]$line = $null
$line = $listings | Select-String -AllMatches -Pattern $regex | ForEach-Object {$_.Matches.groups}

for ($i = 0; $i -lt $line.Count; $i  ) {
    if ($i -eq 0) {
        # do nothing for complete listings
    }
    else {
        Write-Host "group:"$i $line[$i]
    }
   
}

Output:

group: 1 001 First in listing one 002 Second in listing one 003 Third in listing one 
group: 2 001 First in listing two 002 Second in listing two 003 Third in listing two
  • Related