Home > OS >  Returning Multiple Matches Per Line of Text File Plus Sort Order Using Powershell
Returning Multiple Matches Per Line of Text File Plus Sort Order Using Powershell

Time:02-25

I have a test script as follows:

# listingParser.ps1     ---     by alphabetical order NOT numeric
   
$input_path = "C:\test\test3\Copy of 31832_226140__0001-00010.txt"
$output_file = "C:\test\test3\filterorder.txt"
$regexNum = '\d\d\d\s[A-Z][A-Z][A-Z]'           # Roll Entry Number and NAME

$result = select-string -Path $input_path -Pattern $regexNum -CaseSensitive | % { 
$_.Matches } | % { $_.Value } 

$result | Sort-Object > $output_file

The output looks like this:

001 BUZ
001 CAR
002 BUZ
003 BYE
005 BYE
007 CAR
008 BYF
009 BYF
010 CAR
011 BYG
012 CAR
014 CAR
017 BYT
018 BYT
021 CAD

The issues are:

  1. My search returns only one instance per line of text. Some lines of the text file contain multiple examples, of the search pattern, some not. Well at least that is my interpretation of the result (I have just checked this again to be sure) How do I specify ALL examples on any line of the file?

  2. My requirement is to sort the [A-Z][A-Z][A-Z] part of the regex. Taking into account the original search requires the numerical part of the string.

For issue#2: This PowerShell: How do I sort a text file by column? describes splitting each string, casting before sorting. I incorporated this but got errors. In the end I simplified it (removing the cast & sort direction) and got it working.

$result | Sort-Object { $_.split()[-1] } > $output_file

So I think I have issue#2 sorted.

001 BUZ
002 BUZ
003 BYE
005 BYE
008 BYF
009 BYF
011 BYG
017 BYT
018 BYT
021 CAD
031 CAI
030 CAI
029 CAI
032 CAI
024 CAI

So that just leaves how to return all examples of my search.

Any suggestions would be appreciated.

CodePudding user response:

I believe using -AllMatches on Select-String should sort out the need to find all matches per line, another alternative could be to use [regex]::Matches(..) to find all appearances of the matched pattern.

Regarding the need to sort alphabetically the characters, I would personally use:

{ [regex]::Match($_, '[A-Z]{3}').Value }

And if you need to sort by the integers after you can combine it with:

{ [int][regex]::Match($_, '\d{3}').Value }

Below you can see both examples in action, first we can create an example of your file:

$dict = ([int][char]'A'..[int][char]'Z').ForEach([char])
function Ran {
    '{0:000}' -f (Get-Random -Maximum 100)
    $chars = 0..2 | ForEach-Object {
        Get-Random -Maximum $dict.Count
    }
    [string]::new($dict[$chars])
}

$testCase = 0..10 | ForEach-Object {
    [string]@(
        if($_ % 2) { Ran }
        Ran
    )
}

Now, $testCase for me looks like this:

089 CDO
088 XRQ 060 AXS
023 XMH
019 OFM 021 PYD
054 PDY
041 GCG 003 HCJ
071 MCG
033 NAP 089 NPN
011 CEG
069 GDP 011 YTM
025 WQH

Next we can test both, Regex.Matches and Select-String:

$sortChar = { [regex]::Match($_, '[A-Z]{3}').Value }
$sortInt  = { [int][regex]::Match($_, '\d{3}').Value }

$re = [regex]::Matches($testCase, '\d{3} [A-Z]{3}').Value |
Sort-Object $sortChar, $sortInt

$sls = ($testCase | Select-string -Pattern '\d{3} [A-Z]{3}' -CaseSensitive -AllMatches).Matches.Value |
Sort-Object $sortChar, $sortInt

Lastly we can compare if both gives us the same (sorted) result:

Compare-Object -ReferenceObject $re -DifferenceObject $sls -SyncWindow 0 -IncludeEqual

InputObject SideIndicator
----------- -------------
060 AXS     ==
089 CDO     ==
011 CEG     ==
041 GCG     ==
069 GDP     ==
003 HCJ     ==
071 MCG     ==
033 NAP     ==
089 NPN     ==
019 OFM     ==
054 PDY     ==
021 PYD     ==
025 WQH     ==
023 XMH     ==
088 XRQ     ==
011 YTM     ==

Note, if using the Regex.Matches alternative, when reading your file you should use -Raw on Get-Content:

$content = Get-Content $input_path -Raw
[regex]::Matches($content, ...).Value | Sort-Object ... | Set-Content ...
  • Related