Home > Back-end >  Powershell and regex to extract multiple points of data from a file
Powershell and regex to extract multiple points of data from a file

Time:01-02

I am trying to extra multiple points of data (First, Last, ID number) from a rather nasty log file.

I have this: Get-Content c:\LOG\22JAN01.log | Out-String | % {[Regex]::Matches($_, "(?<=FIRST:)((.|\n)*?)(?=LAST:)")} | % {$_.Value}

Which does a fine job of extracting the first name - but I need to also get the last name and ID number from the same line and present them together "BOB SMITH 123456"

Each line of the log file looks like this:

FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304

I would like the output to look something like:

  • BOB SMITH 123456
  • JACK JONES 029506
  • KAREN KARPENTER 6890298

So far I can only manage to get all the first names and nothing else. Thanks for any help pointing me in the right direction!

CodePudding user response:

If they are always on the same line, I like to use switch to read it.

switch -Regex -File c:\LOG\22JAN01.log {
    'FIRST:(\w ) LAST:(. ) DOOR.  ID:(\d ) ' {
        [PSCustomObject]@{
            First = $matches[1]
            Last  = $matches[2]
            ID    = $matches[3]
        }
    }
}

Sample log output

First Last      ID     
----- ----      --     
BOB   SMITH     123456 
JACK  JONES     029506 
KAREN KARPENTER 6890298

You can capture it to a variable and then continue using the objects however you like.

$output = switch -Regex -File c:\LOG\22JAN01.log {
    'FIRST:(\w ) LAST:(. ) DOOR.  ID:(\d ) ' {
        [PSCustomObject]@{
            First = $matches[1]
            Last  = $matches[2]
            ID    = $matches[3]
        }
    }
}

$output | Out-GridView

$output | Export-Csv -Path c:\Log\parsed_log.log -NoTypeInformation

CodePudding user response:

You need to use capture groups ().

Assuming that the first name and last name are strings of consecutive capital letters A-Z, you could use, for example:

Select-String -Path c:\LOG\22JAN01.log -Pattern "^FIRST:([A-Z] ) LAST:([A-Z] ) .*? ID:(\d )" -AllMatches | % {$_.Matches} | % {@($_.Groups[1].Value, $_.Groups[2].Value, $_.Groups[3].Value) -join " "}

CodePudding user response:

Using this reusable function:
(See also: #16257 String >>>Regex>>> PSCustomObject)

function ConvertFrom-Text {
    [CmdletBinding()]Param (
        [Regex]$Pattern,
        [Parameter(Mandatory = $True, ValueFromPipeLine = $True)]$InputObject
    )
    process {
        if ($_ -match $pattern) {
            $matches.Remove(0)
            [PSCustomObject]$matches
        }
    }
}

$log = @(
    'FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304'
    'FIRST:JOHN LAST:DOE DOOR:MAIN ENTRANCE ID:789101 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304'
)

$Log |ConvertFrom-Text -Pattern '\bFIRST:(?<First>\S*).*\bLAST:(?<Last>\S*).*\bID:(?<ID>\d )'

ID     Last  First
--     ----  -----
123456 SMITH BOB
789101 DOE   JOHN

CodePudding user response:

Assuming the log file looks literally as what we see in the quoted text you could match it like this:

$log = @'
FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304
FIRST:JOHN LAST:DOE DOOR:MAIN ENTRANCE ID:789101 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304
'@

$re = [regex]'(?si)FIRST:(?<first>.*?)\s*LAST:(?<last>.*?)\s*DOOR.*?ID:(?<id>.*?)\s'

foreach($match in $re.Matches($log))
{
    '{0} {1} {2}' -f
        $match.Groups['first'].Value,
        $match.Groups['last'].Value,
        $match.Groups['id'].Value
}

# Results in:
BOB SMITH 123456
JOHN DOE 789101

This regex should work on a multi-line string so you would use -Raw for Get-Content:

$re = [regex]'(?si)FIRST:(?<first>.*?)\s*LAST:(?<last>.*?)\s*DOOR.*?ID:(?<id>.*?)\s'

$result = foreach($match in $re.Matches((Get-Content ./test.log -Raw)))
{
    [pscustomobject]@{
        First = $match.Groups['first'].Value
        Last  = $match.Groups['last'].Value
        ID    = $match.Groups['id'].Value
    }
}

$result | Export-Csv path/to/newlog.csv -NoTypeInformation

See https://regex101.com/r/EcQbjE/1 for the regex explanation.

  • Related