Home > OS >  Powershell - Create CSV from files(s) with non-conforming data
Powershell - Create CSV from files(s) with non-conforming data

Time:03-04

I have 4K files that need to be converted into CSV files. Each field needs to be extracted and then added to a CSV file in the order they are found.

I have a script that I'm hoping I can re-use for this purpose. The script below will read the first line and examine it then process the remaining lines.

I altered the script to capture JUST the first string on the first line. That works. I am able to capture "ADJMT-HIS-WK". I now need to process the rest of the file.

As you can see the rest of the file consists of two numerics a column name and then more characters at the end of the line like this.

05 DTE-CALL               9(6)

I need to capture just column name (in this case for this line it's DTE-CALL) on every line. The column name appears only once per line and it is always after the two numerics. Once I have the data, I then have to output it into a CSV file consisting of all the fields I captured including the Table name from line one and all the column names from the rest of the file. Each file may have at minimum a single occurrence of each or a single table name with hundreds of column names.

My output file needs to look like this

ADJMT-HIST-WK,SRVCG-OFC-CDE,ST-CDE-FMHA,CTY-DST-CDE,SRVCG-CDE-OFC-CTY,.,.,.,. and so on.

If I can wrap each field in quotes, that would save me a step later on.

Would anyone be able to simplify this procedure for me?

RAW Data

ADJMT-HIST-WK                            VER     1 D  SUFFIX

          05 SRVCG-OFC-CDE                                                     
          10 ST-CDE-FMHA                           9(2)                       
          10 CTY-DST-CDE                           9                          
          10 SRVCG-CDE-OFC-CTY                     9(2)                       
         05 DTE-CALL                               9(6)                       
         05 CR-AMT-ADJMT                           S9(9)V99                   
         05 DR-AMT-ADJMT                           S9(9)V99                   
         05 PROCG-STAT-INDCTR                      X(1)                       
         05 DTE-DPST                                                          
          10 MO-GRGRN                              9(2)                       
          10 DAY-GRGRN                             9(2)                       
          10 YR                                    9(2)                       
         05 REAS-CDE-CB                            X(3)                       
         05 USER-ID                                                           
          10 AUTHY-CDE-TRML-OPRTR                  X(1)

Script

$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse 
$regex = '^\S '
ForEach ($CBLFile in $CBLFileList) {
  $firstLine, $remainingLines = $CBLFile | Get-Content
  if ($firstLine -cmatch $regex) {
      $toRemove = $Matches[0].Trim()
      Write-Host "Found Match - $toRemove " -foregroundcolor Red
 #   & { $firstLine -creplace $regex; $remainingLines -creplace $toRemove } |
 #     Set-Content -LiteralPath $CBLFile.FullName
  }
}

CodePudding user response:

You can use the exact same approach to extract the column names: use -{c,i,}match to test if the string has the relevant data, $Matches to extract.

You can use the -f string format operator to quote each name.

$CBLFileList = Get-ChildItem -LiteralPath "C:\IDMS" -File -Recurse 
$tableNameRegex = '^\S '
$columnNameRegex = '^\s*\d{2}\s (\S )'

foreach ($CBLFile in $CBLFileList) {
  $firstLine, $remainingLines = $CBLFile | Get-Content
  if ($firstLine -match $tableNameRegex) {
    # extract table name, add quotes
    $tableName = '"{0}"' -f $Matches[0]

    # then do the same for all the columns names
    $columnNames = foreach($line in $remainingLines){
      if($line -match $columnNameRegex){
        # again, add quotes before outputting
        '"{0}"' -f $Matches[1]
      }
    }

    # concatenate all names with comma, write to disk
    @($tableName;$columnNames) -join ',' |Set-Content "$($CBLFile.BaseName).csv"
  }
}
  • Related