Home > Back-end >  PowerShell: Pattern Matching Text File Contents to Insert into .CSV
PowerShell: Pattern Matching Text File Contents to Insert into .CSV

Time:05-11

I have been struggling to successfully break apart contents of a text file and insert them into a .csv with the following rules:

  1. The line containing '>' should be inserted into .csv column 1
  2. The lines containing all caps should be inserted into .csv column 2 and each block of capital letters should be joined (have its `r or `n removed)
  3. '>' and '*' should be removed where present

Separately, I can get column 1 to work fairly well using:

$file = (Get-Content 'samplefile.txt')

$data = foreach ($line in $file) {
    if ($line -match '^>') {
            [pscustomobject]@{
            'Part1' = (Select-String '^>' -InputObject $line) -replace '>', ''
            }
}
}
$data | Out-File 'newfile.csv'

and limited success using similar for column 2 (I can't seem to get -join to work with `r or `n):

$file = (Get-Content 'samplefile.txt')

$data = foreach ($line in $file) {
    if ($line -match '^[A-Z].*') {
            [pscustomobject]@{
            'Part2' = (Select-String '^[A-Z].*' -InputObject $line) -replace '*', ''
            }
}
}
$data | Out-File 'newfile.csv'

But it escapes me how to get both to work in the same code block to iterate over each section delimited by '>' and/or '*'.

Below is a sample of the data for reference.

>9392290|2983921
FYUOIQWEFYUOIAGSNJJJHKEWAHJKTHJEWUYIYGUIOIOIUYAFUIOWUEYOUYIA
GDFOUYUIOAGHIHUAGSD
>lsm.VI.superconfig_5640.1|lsm.model.superconfig_5640.1
FDASJKLHJKLGAHJKDFGHJKAGJKHUIGAHIULGRUOUHWWUGUIOHZIOJSHIJMAW
DFSANJKLNJLWEQUIOGFDSOIYUBHPOGANUPPUNABNPUNUPAPNUNPUFSAPNUSS
FSADUHHULGWAUNUNWEANNIOEAWNUNIIIINNBSDNJLKNJKLAERGJKLHHJLKGS
DFSAQSAHUSDFAHOUHGROUGRWE*
>jfi.ZJ.superconfig_99.31|jfi.model.superconfig_99.31
ASDFUIOHPOASPNADPUNPNUSADFNUPPUOHZSABUHBAHPUDASPHAWHPOEWGHPI
GWANUEGWUNPNPEANUPUNPEAWUPOGDFPOAGIJJIEOAWIOAGPIOJSGNJHIOWEA
AUHNHIOEANPIASPNIOICBNIOASGIOEGWPIOWEPPPPSAJPOJKGPWEAIOJJPIO
FAWEIOPHGAHNIOPGWEOPPOEAWSPIOOPUIGSUIOGUIOPWAGIEOUIWEAOGUIOP
GEIOJHIOJPWEPJIOWGEIOPHGANIONIOGEWANIOEGWOPIHNNPIOEGWIJOWEAG
GEPUIEWUIOSZBHJENWNBENUEBMIPEWVMIEMUIAZWIPNBWEPEWIOJJKEAWPIA
GWEPHIOEWNPOEWANNNPIOGWREIJUOGUHIOSNJJJJJJJJKVMVIOIPEGIOEAUW
EGWIOJNENIOPIOWINPEAWNPOI*

CodePudding user response:

I suggest using a -split operation:

(Get-Content -Raw samplefile.txt) -split '(?m)^>(. )' -ne '' |
  ForEach-Object -Begin { $i = 0 } -Process {
    if (  $i % 2) {          # 1st, 3rd, ... result, i.e. the ">"-prefixed lines
      $part1 = $_            # Save for later.
    } else {                 # 2nd, 4th, ... result, i.e. the all-uppercase lines
      [pscustomobject] @{   # Construct and output a custom object.
        Part1 = $part1
        Part2 = $_ -replace '\r?\n|\*$' # Remove newlines and trailing "*"
      }
    }
  }  # pipe to Export-Csv as needed.

To-display output:

Part1                                                  Part2
-----                                                  -----
9392290|2983921                                        FYUOIQWEFYUOIAGSNJJJHKEWAHJKTHJEWUYIYGUIOIOIUYAFUIOWUEYOUYIAGDFOUYUIOAGHIHUAGSD
lsm.VI.superconfig_5640.1|lsm.model.superconfig_5640.1 FDASJKLHJKLGAHJKDFGHJKAGJKHUIGAHIULGRUOUHWWUGUIOHZIOJSHIJMAWDFSANJKLNJLWEQUIOGFDSOIYUBHPOGANUPPUNABNPUNU…
jfi.ZJ.superconfig_99.31|jfi.model.superconfig_99.31   ASDFUIOHPOASPNADPUNPNUSADFNUPPUOHZSABUHBAHPUDASPHAWHPOEWGHPIGWANUEGWUNPNPEANUPUNPEAWUPOGDFPOAGIJJIEOAWIO…
  • Related