Home > Enterprise >  Parse structured file (FIX 4.4) in powershell
Parse structured file (FIX 4.4) in powershell

Time:06-26

I need to parse a structured file (FIX protocol 4.4) in powershell. The structure is like this

20220606-21:10:21.930 : 8=FIX.4.49=209 35=W34=35 49=FIXDIRECT.FT 52=20220606-21:10:21.925 56=MM_EUR_FIX_QS 55=US30 262=96 268=2 269=0 270=32921.6 271=2000000 299=16ynjsz-16ynjsz5qCaA 269=1 270=32931.4 271=2000000 299=16ynjsz-16ynjsz5qCaA 10=048

I need to pick only specific values following tags. I need the first value (timestamp) until the colon which does not have a tag number but then need to pick values following specific tag numbers. For example tag values 55, 270 and 271 (multiple 270 and 271 values exist here)

I am able to parse utilizing a simple ordered method of " " and "=" as delimiters

$contents = Get-Content FIX.log
foreach($line in $contents) {
    $s = $line.split("= ")
    write-host $s[0] $s[17] $s[25] $s[27] $s[33] $s[35]
}

however I prefer to be able to pinpoint the value using the tag numbers as there are some lines in the file that do not conform to the same content.

Result should be something like this

20220606-21:10:21.930 US30 32921.6 2000000 32931.4 2000000

CodePudding user response:

Combine -split, -match, and -replace as follows:

# Sample line that simulates your Get-Content call.
$content = '20220606-21:10:21.930 : 8=FIX.4.49=209 35=W34=35 49=FIXDIRECT.FT 52=20220606-21:10:21.925 56=MM_EUR_FIX_QS 55=US30 262=96 268=2 269=0 270=32921.6 271=2000000 299=16ynjsz-16ynjsz5qCaA 269=1 270=32931.4 271=2000000 299=16ynjsz-16ynjsz5qCaA 10=048'

foreach ($line in $content) {

  # Split into fields, by " " or " : "
  $first, [array] $rest = $line -split ' (?:: )?'

  # Extract the tokens of interest:
  #  * Use the first one as-is
  #  * Among the remaining ones, use -match to filter in only
  #    those with the tag numbers of interest, then use -replace
  #    on the results to strip the tag number plus the separator ("=")
  #    from each.
  $tokensOfInterest =
    , $first   (($rest -match '^(?:55|270|271)=') -replace '^. =')

  # Output the resulting array as a single-line, space-delimited
  # list, which is how Write-Host stringifies arrays.
  # Note: Do NOT use Write-Host to output *data*.
  Write-Host $tokensOfInterest

}

This yields the sample output in your question, namely:

20220606-21:10:21.930 US30 32921.6 2000000 32931.4 2000000

CodePudding user response:

Here is another take on the problem, using the .NET Regex class.

$contents = Get-Content FIX.log

# Tags to search for, separated by RegEx alternation operator
$tagsPattern = '55|270|271'

foreach($line in $contents) {
    # Extract the datetime field
    $dateTime = [regex]::Match( $line, '^\d{8}-\d{2}:\d{2}:\d{2}\.\d{3}' ).Value
    
    # Extract the desired tag values
    $tagValues = [regex]::Matches( $line, "(?<= (?:$tagsPattern)=)[^ ] " ).Value

    # Output everything
    Write-Host $dateTime $tagValues
}
  • The [regex]::Match() method matches the first instance of the given pattern and returns a single Match object, whose Value property contains the matched value.
  • The [regex]::Matches() method finds all matches of the pattern. It returns a collection of Match objects. With the aid of PowerShell's convenient member access enumeration feature, we directly create an array of all Value properties.
  • Explanation and demos of the RegEx patterns at regex101.com:
  • Related