I am attempting to split a csv file based on unique column values into multiple files using awk. I am able to split the csv successfully with awk -F\, '{print > $2".csv"}' example.csv
however it is committing the header column from the new files.
For example:
example.csv
Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
Green,Ford,Explorer
Black,Ford,F-150
Result:
Kia.csv
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv
Green,Ford,Explorer
Black,Ford,F-150
My desired output:
Kia.csv
Color,Car,Make
Red,Kia,Spectra
Orange,Kia,Sportage
___________________
Ford.csv
Color,Car,Make
Green,Ford,Explorer
Black,Ford,F-150
To attempt getting the header column passed to the new files, I attempted something like this awk -F'|' 'FNR==1{hdr=$0;next} {if (!seen[$1] ) print hdr>$2; print>$2}' example.csv
but unfortunately this did not have the intended result.
CodePudding user response:
You are almost there. Would you please try:
awk -F, '
FNR==1 {header = $0; next}
!seen[$2] {print header > $2".csv"}
{print > $2".csv"}
' example.csv
If you have many varieties of car makes, "too many open files error" may occur. In such cases, please close files referring @RavinderSingh13's answer.
CodePudding user response:
1st solution: With your shown samples, please try following awk
code.
awk -F, '
FNR==NR{
header=$0
next
}
{
outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
close(prev)
print header > (outputFile)
}
{
print $0 > (outputFile)
prev=outputFile
}
' <(head -1 Input_file) <(tail -n 2 Input_file | sort -t, -k2)
2nd solution: Adding solution with only 1 pass of reading Input_file.
awk -F, -v header=$(head -1 Input_file) '
{
outputFile=$2".csv"
}
prev!=$2".csv" || !prev{
close(prev)
print header > (outputFile)
}
{
print $0 > (outputFile)
prev=outputFile
}
' <(tail -n 2 Input_file | sort -t, -k2)