I have been struggling to successfully break apart contents of a text file and insert them into a .csv with the following rules:
- The line containing '>' should be inserted into .csv column 1
- The lines containing all caps should be inserted into .csv column 2 and each block of capital letters should be joined (have its `r or `n removed)
- '>' and '*' should be removed where present
Separately, I can get column 1 to work fairly well using:
$file = (Get-Content 'samplefile.txt')
$data = foreach ($line in $file) {
if ($line -match '^>') {
[pscustomobject]@{
'Part1' = (Select-String '^>' -InputObject $line) -replace '>', ''
}
}
}
$data | Out-File 'newfile.csv'
and limited success using similar for column 2 (I can't seem to get -join
to work with `r or `n):
$file = (Get-Content 'samplefile.txt')
$data = foreach ($line in $file) {
if ($line -match '^[A-Z].*') {
[pscustomobject]@{
'Part2' = (Select-String '^[A-Z].*' -InputObject $line) -replace '*', ''
}
}
}
$data | Out-File 'newfile.csv'
But it escapes me how to get both to work in the same code block to iterate over each section delimited by '>' and/or '*'.
Below is a sample of the data for reference.
>9392290|2983921 FYUOIQWEFYUOIAGSNJJJHKEWAHJKTHJEWUYIYGUIOIOIUYAFUIOWUEYOUYIA GDFOUYUIOAGHIHUAGSD >lsm.VI.superconfig_5640.1|lsm.model.superconfig_5640.1 FDASJKLHJKLGAHJKDFGHJKAGJKHUIGAHIULGRUOUHWWUGUIOHZIOJSHIJMAW DFSANJKLNJLWEQUIOGFDSOIYUBHPOGANUPPUNABNPUNUPAPNUNPUFSAPNUSS FSADUHHULGWAUNUNWEANNIOEAWNUNIIIINNBSDNJLKNJKLAERGJKLHHJLKGS DFSAQSAHUSDFAHOUHGROUGRWE* >jfi.ZJ.superconfig_99.31|jfi.model.superconfig_99.31 ASDFUIOHPOASPNADPUNPNUSADFNUPPUOHZSABUHBAHPUDASPHAWHPOEWGHPI GWANUEGWUNPNPEANUPUNPEAWUPOGDFPOAGIJJIEOAWIOAGPIOJSGNJHIOWEA AUHNHIOEANPIASPNIOICBNIOASGIOEGWPIOWEPPPPSAJPOJKGPWEAIOJJPIO FAWEIOPHGAHNIOPGWEOPPOEAWSPIOOPUIGSUIOGUIOPWAGIEOUIWEAOGUIOP GEIOJHIOJPWEPJIOWGEIOPHGANIONIOGEWANIOEGWOPIHNNPIOEGWIJOWEAG GEPUIEWUIOSZBHJENWNBENUEBMIPEWVMIEMUIAZWIPNBWEPEWIOJJKEAWPIA GWEPHIOEWNPOEWANNNPIOGWREIJUOGUHIOSNJJJJJJJJKVMVIOIPEGIOEAUW EGWIOJNENIOPIOWINPEAWNPOI*
CodePudding user response:
I suggest using a -split
operation:
(Get-Content -Raw samplefile.txt) -split '(?m)^>(. )' -ne '' |
ForEach-Object -Begin { $i = 0 } -Process {
if ( $i % 2) { # 1st, 3rd, ... result, i.e. the ">"-prefixed lines
$part1 = $_ # Save for later.
} else { # 2nd, 4th, ... result, i.e. the all-uppercase lines
[pscustomobject] @{ # Construct and output a custom object.
Part1 = $part1
Part2 = $_ -replace '\r?\n|\*$' # Remove newlines and trailing "*"
}
}
} # pipe to Export-Csv as needed.
To-display output:
Part1 Part2
----- -----
9392290|2983921 FYUOIQWEFYUOIAGSNJJJHKEWAHJKTHJEWUYIYGUIOIOIUYAFUIOWUEYOUYIAGDFOUYUIOAGHIHUAGSD
lsm.VI.superconfig_5640.1|lsm.model.superconfig_5640.1 FDASJKLHJKLGAHJKDFGHJKAGJKHUIGAHIULGRUOUHWWUGUIOHZIOJSHIJMAWDFSANJKLNJLWEQUIOGFDSOIYUBHPOGANUPPUNABNPUNU…
jfi.ZJ.superconfig_99.31|jfi.model.superconfig_99.31 ASDFUIOHPOASPNADPUNPNUSADFNUPPUOHZSABUHBAHPUDASPHAWHPOEWGHPIGWANUEGWUNPNPEANUPUNPEAWUPOGDFPOAGIJJIEOAWIO…