I have 4K files that need to be converted into CSV files. Each field needs to be extracted and then added to a CSV file in the order they are found.
I have a script that I'm hoping I can re-use for this purpose. The script below will read the first line and examine it then process the remaining lines.
I altered the script to capture JUST the first string on the first line. That works. I am able to capture "ADJMT-HIS-WK". I now need to process the rest of the file.
As you can see the rest of the file consists of two numerics a column name and then more characters at the end of the line like this.
05 DTE-CALL 9(6)
I need to capture just column name (in this case for this line it's DTE-CALL) on every line. The column name appears only once per line and it is always after the two numerics. Once I have the data, I then have to output it into a CSV file consisting of all the fields I captured including the Table name from line one and all the column names from the rest of the file. Each file may have at minimum a single occurrence of each or a single table name with hundreds of column names.
My output file needs to look like this
ADJMT-HIST-WK,SRVCG-OFC-CDE,ST-CDE-FMHA,CTY-DST-CDE,SRVCG-CDE-OFC-CTY,.,.,.,. and so on.
If I can wrap each field in quotes, that would save me a step later on.
Would anyone be able to simplify this procedure for me?
RAW Data
ADJMT-HIST-WK VER 1 D SUFFIX
05 SRVCG-OFC-CDE
10 ST-CDE-FMHA 9(2)
10 CTY-DST-CDE 9
10 SRVCG-CDE-OFC-CTY 9(2)
05 DTE-CALL 9(6)
05 CR-AMT-ADJMT S9(9)V99
05 DR-AMT-ADJMT S9(9)V99
05 PROCG-STAT-INDCTR X(1)
05 DTE-DPST
10 MO-GRGRN 9(2)
10 DAY-GRGRN 9(2)
10 YR 9(2)
05 REAS-CDE-CB X(3)
05 USER-ID
10 AUTHY-CDE-TRML-OPRTR X(1)
Script
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$regex = '^\S '
ForEach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -cmatch $regex) {
$toRemove = $Matches[0].Trim()
Write-Host "Found Match - $toRemove " -foregroundcolor Red
# & { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
# Set-Content -LiteralPath $CBLFile.FullName
}
}
CodePudding user response:
You can use the exact same approach to extract the column names: use -{c,i,}match
to test if the string has the relevant data, $Matches
to extract.
You can use the -f
string format operator to quote each name.
$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse
$tableNameRegex = '^\S '
$columnNameRegex = '^\s*\d{2}\s (\S )'
foreach ($CBLFile in $CBLFileList) {
$firstLine, $remainingLines = $CBLFile | Get-Content
if ($firstLine -match $tableNameRegex) {
# extract table name, add quotes
$tableName = '"{0}"' -f $Matches[0]
# then do the same for all the columns names
$columnNames = foreach($line in $remainingLines){
if($line -match $columnNameRegex){
# again, add quotes before outputting
'"{0}"' -f $Matches[1]
}
}
# concatenate all names with comma, write to disk
@($tableName;$columnNames) -join ',' |Set-Content "$($CBLFile.BaseName).csv"
}
}