Home > Back-end >  Using powershell how to extract a number from a list
Using powershell how to extract a number from a list

Time:09-02

I have a file with git commits,

c2b7b1913 Merged PR 38064: Lorem ipsum dolor sit amet
baa810a57 Merged PR 37937: Lorem ipsum dolor sit amet
8d67d563c Merged PR 37825: Lorem ipsum dolor sit amet
2a061da0b Merged PR 37494: Lorem ipsum dolor sit amet

How do I use powershell to get just the PR number, i.e. I would like

38064
37937
37825
37494

Here is my attempts

Get-Content .\testdata.txt `
| Select-String -Pattern "^.*Merged PR (\d{5}):.*$" -AllMatches `
| ForEach-Object {$_.Matches.Groups}

Which seems to return the correct data. Below is the output. But how do I get to the regex group?

Groups   : {0, 1}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 54
Value    :  c2b7b1913 Merged PR 38064: Lorem ipsum dolor sit amet

Success  : True
Name     : 1
Captures : {1}
Index    : 21
Length   : 5
Value    : 38064

Groups   : {0, 1}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 54
Value    :  baa810a57 Merged PR 37937: Lorem ipsum dolor sit amet

Success  : True
Name     : 1
Captures : {1}
Index    : 21
Length   : 5
Value    : 37937

Groups   : {0, 1}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 54
Value    :  8d67d563c Merged PR 37825: Lorem ipsum dolor sit amet

Success  : True
Name     : 1
Captures : {1}
Index    : 21
Length   : 5
Value    : 37825

Groups   : {0, 1}
Success  : True
Name     : 0
Captures : {0}
Index    : 0
Length   : 54
Value    :  2a061da0b Merged PR 37494: Lorem ipsum dolor sit amet

Success  : True
Name     : 1
Captures : {1}
Index    : 21
Length   : 5
Value    : 37494

Here is the equivalent sed command

sed -nE 's/^.*Merged PR ([[:digit:]]{5}):.*$/\1/p' 

CodePudding user response:

You can get them by accessing .Groups[1].Value property:

Get-Content Get-Content .\testdata.txt `
 | Select-String -Pattern "Merged PR (\d{5}):" -AllMatches `
 | ForEach-Object {$_.Matches.Groups[1].Value}

Note the ^.* and .*$ parts are not necessary because PowerShell regex matching does not require a complete string to match the pattern.

With Merged PR (\d{5}): regex, you match Merged PR substring and capture into a separate group five digits (with (\d{5})) that are immediately followed with a : char. So, once captured, you just access the right group value in the code.

CodePudding user response:

You could simply use split:

gc C:\tmp\testdata.txt | %{($_ -split " ")[3] -replace ":"}

Or back to your version:

$data = gc C:\tmp\testdata.txt | Select-String -Pattern "(\d{5})" -AllMatches
$data.matches.groups.value

CodePudding user response:

You don't need to use Get-Content, Select-String supports the path parameter:

Select-String -Path <path> -Pattern '(?<=PR\s)[0-9] ' | ForEach-Object { $_.Matches.value }
(?<=PR\s)[0-9] 

(?<=PR\s) : Find PR in string with 1 whitespace but dont include it in result, example - 'PR '
[0-9]   : 1 or more digits

Example:

<#
    File contents ion C:\tmp\testmsg.txt
    c2b7b1913 Merged PR 38064: Lorem ipsum dolor sit amet
    baa810a57 Merged PR 37937: Lorem ipsum dolor sit amet
    8d67d563c Merged PR 37825: Lorem ipsum dolor sit amet
    2a061da0b Merged PR 37494: Lorem ipsum dolor sit amet
#>

Select-String -Path C:\tmp\testmsg.txt -Pattern '(?<=PR\s)[0-9] ' | ForEach-Object { $_.Matches.Value }

38064
37937
37825
37494
  • Related