Home > Blockchain >  Splitting csv into multiple files with header using awk
Splitting csv into multiple files with header using awk

Time:03-17

I am attempting to split a csv file based on unique column values into multiple files using awk. I am able to split the csv successfully with awk -F\, '{print > $2".csv"}' example.csv however it is committing the header column from the new files.

For example:

example.csv

Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
Green,Ford,Explorer
Black,Ford,F-150

Result:

Kia.csv

Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv

Green,Ford,Explorer
Black,Ford,F-150

My desired output:

Kia.csv

Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv

Color,Car,Make
Green,Ford,Explorer
Black,Ford,F-150

To attempt getting the header column passed to the new files, I attempted something like this awk -F'|' 'FNR==1{hdr=$0;next} {if (!seen[$1] ) print hdr>$2; print>$2}' example.csv but unfortunately this did not have the intended result.

CodePudding user response:

You are almost there. Would you please try:

awk -F, '
    FNR==1 {header = $0; next}
    !seen[$2]   {print header > $2".csv"}
    {print > $2".csv"}
' example.csv

If you have many varieties of car makes, "too many open files error" may occur. In such cases, please close files referring @RavinderSingh13's answer.

CodePudding user response:

1st solution: With your shown samples, please try following awk code.

awk -F, '
FNR==NR{
  header=$0
  next
}
{
  outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
  close(prev)
  print header > (outputFile)
}
{
  print $0 > (outputFile)
  prev=outputFile
}
' <(head -1 Input_file) <(tail -n  2 Input_file | sort -t, -k2)


2nd solution: Adding solution with only 1 pass of reading Input_file.

awk -F, -v header=$(head -1 Input_file) '
{
  outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
  close(prev)
  print header > (outputFile)
}
{
  print $0 > (outputFile)
  prev=outputFile
}
' <(tail -n  2 Input_file | sort -t, -k2)
  • Related