Home > Net >  How to pad missing columns in csv file with commas at the end
How to pad missing columns in csv file with commas at the end

Time:03-28

I have a CSV file with a variable number of 1-4 of the last 4 columns filled in. However, the rows missing values are not padded with commas, and thus programs fail to interpret the CSV correctly. How do I do add the missing commas at the end? Keep in mind this is a VERY large CSV file(500m lines, 200 gb), so it needs to be efficient.

Example of incorrect file:

col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9

Example of desired result:

col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9

CodePudding user response:

Force 4 columns with GNU awk:

awk 'BEGIN{FS=OFS=","} {NF=4; print}' file

Output:

col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9

See: 8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR

CodePudding user response:

Using Miller and running

mlr --csv --ragged cat input.csv >output.csv

you have

col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9
  • Related