Home > database >  Issues merging multiple CSV files in Powershell
Issues merging multiple CSV files in Powershell

Time:10-29

I found a nifty command here - http://www.stackoverflow.com/questions/27892957/merging-multiple-csv-files-into-one-using-powershell that I am using to merge CSV files -

Get-ChildItem -Filter *.csv | Select-Object -ExpandProperty FullName | Import-Csv | Export-Csv .\merged\merged.csv -NoTypeInformation -Append

Now this does what it says on the tin and works great for the most part. I have 2 issues with it however, and I am wondering if there is a way they can be overcome:

Firstly, the merged csv file has CRLF line endings, and I am wondering how I can make the line endings just LF, as the file is being generated?

Also, it looks like there are some shenanigans with quote marks being added/moved around. As an example:

Sample row from initial CSV:

"2021-10-05"|"00:00"|"1212"|"160477"|"1.00"|"3.49"LF

Same row in the merged CSV:

"2021-10-05|""00:00""|""1212""|""160477""|""1.00""|""3.49"""CRLF

So see that the first row has lost its trailing quotes, other fields have doubled quotes, and the end of the row has an additional quote. I'm not quite sure what is going on here, so any help would be much appreciated!

CodePudding user response:

For dealing with the quotes, the cause of the “problem” is that your CSV does not use the default field delimiter that Import-CSV assumes - the C in CSV stands for comma, and you’re using the vertical bar. Add the parameter -Delimiter "|" to both the Import-CSV and Export-CSV cmdlets.

I don’t think you can do anything about the line-end characters (CRLF vs LF); that’s almost certainly operating-system dependent.

CodePudding user response:

Jeff Zeitlin's helpful answer explains the quote-related part of your problem well.

As for your line-ending problem:

  • As of PowerShell 7.2, there are no PowerShell-native features that allow you to control the newline format of file-writing cmdlets such as Export-Csv.

  • However, if you use plain-text processing, you can use multi-line strings built with the newline format of interest and save / append them with Set-Content and its -NoNewLine switch, which writes the input strings as-is, without a (newline) separator.

In fact, to significantly speed up processing in your case, plain-text handling is preferable, since in essence your operation amounts to concatenating text files, the only twist being that the header lines of all but the first file should be skipped; using plain-text handling also bypasses your quote problem:

$tokenCount = 1
Get-ChildItem -Filter *.csv | 
  Get-Content -Raw | 
    ForEach-Object {
      # Get the file content and replace CRLF with LF.
      # Include the first line (the header) only for the first file.
      $content = ($_ -split '\r?\n', $tokenCount)[-1].Replace("`r`n", "`n")
      $tokenCount = 2 # Subsequent files should have their header ignored.
      # Make sure that each file content ends in a LF
      if (-not $content.EndsWith("`n")) { $content  = "`n" }
      # Output the modified content.
      $content
    } | 
      Set-Content -NoNewLine ./merged/merged.csv # add -Encoding as needed.
  • Related