I have a CSV file with a variable number of 1-4 of the last 4 columns filled in. However, the rows missing values are not padded with commas, and thus programs fail to interpret the CSV correctly. How do I do add the missing commas at the end? Keep in mind this is a VERY large CSV file(500m lines, 200 gb), so it needs to be efficient.
Example of incorrect file:
col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9
Example of desired result:
col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9
CodePudding user response:
Force 4 columns with GNU awk
:
awk 'BEGIN{FS=OFS=","} {NF=4; print}' file
Output:
col1,col2,col3,col4 val2,val3,val5,val6 val2,,, val2,val3,, val2,val4,val8,val9
See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
CodePudding user response:
Using Miller and running
mlr --csv --ragged cat input.csv >output.csv
you have
col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9