Home > Enterprise >  Append multiple CSVs ignoring repeated headers
Append multiple CSVs ignoring repeated headers

Time:07-30

I have many CSVs with the same fields and same name ("data.csv"). Each csv has a header and multiple lines and is inside a different folder.

E.g. folder1 has a csv called data.csv:

NAME, COUNTRY  
JOHN, USA
MARY, Panama

folder2 has a csv called data.csv:

NAME, COUNTRY  
James, UK
Jim, India

folder3 has a csv called data.csv:

NAME, COUNTRY  
James, UK
Jim, India

Now I want to combine all csv's into one, but without repeating the headers.

So far I am doing:

 find . -name "data.csv" | xargs cat > mergedCSV

Which works fine, except for the repeated headers.

CodePudding user response:

You can use csvstack from the handy csvkit package to concatenate multiple CSV files with the same layout:

find . -name data.csv | xargs csvstack > mergedCSV

CodePudding user response:

You can use miller very easily with the "cat" built-in command/verb

find . -name data.csv | xargs mlr --csv cat

and if you want pretty formatting with 3 files as input:

mlr --opprint --barred --icsv cat a/data.csv b/data.csv c/data.csv
 ------- ------------ 
| NAME  |  COUNTRY   |
 ------- ------------ 
| JOHN  |  USA       |
| MARY  |  Panama    |
| James |  UK        |
| Jim   |  India     |
| James |  UK        |
| Jim   |  India     |
 ------- ------------ 
  • Related