i have a question taking the below set Coma separated CSV i want to run a script in bash that sums all values from colums 7,8,9 from an especific city and show the row with the max value so Original dataset:
Row,name,city,age,height,weight,good rates,bad rates,medium rates
1,john,New York,25,186,98,10,5,11
2,mike,New York,21,175,87,19,6,21
3,Sandy,Boston,38,185,88,0,5,6
4,Sam,Chicago,34,167,76,7,0,2
5,Andy,Boston,31,177,85,19,0,1
6,Karl,New York,33,189,98,9,2,1
7,Steve,Chicago,45,176,88,10,3,0
the desire output will be
Row,name,city,age,height,weight,good rates,bad rates,medium rates,max rates by city
2,mike,New York,21,175,87,19,6,21,46
5,Andy,Boston,31,177,85,19,0,1,20
7,Steve,Chicago,45,176,88,10,3,0,13
im trying with this; but it gives me only the highest rate number so 46 but i need it by city and that shows all the row, any ideas how to continue?
awk 'BEGIN {FS=OFS=","}{sum = 0; for (i=7; i<=9;i ) sum = $i} NR ==1 || sum >max {max = sum}
CodePudding user response:
You may use this awk:
awk '
BEGIN {FS=OFS=","}
NR==1 {
print $0, "max rates by city"
next
}
{
s = $7 $8 $9
if (s > max[$3]) {
max[$3] = s
rec[$3] = $0
}
}
END {
for (i in max)
print rec[i], max[i]
}' file
Row,name,city,age,height,weight,good rates,bad rates,medium rates,max rates by city
7,Steve,Chicago,45,176,88,10,3,0,13
2,mike,New York,21,175,87,19,6,21,46
5,Andy,Boston,31,177,85,19,0,1,20
or to get tabular output:
awk 'BEGIN {FS=OFS=","} NR==1{print $0, "max rates by city"; next} {s=$7 $8 $9; if (s > max[$3]) {max[$3] = s; rec[$3] = $0}} END {for (i in max) print rec[i], max[i]}' file | column -s, -t
Row name city age height weight good rates bad rates medium rates max rates by city
7 Steve Chicago 45 176 88 10 3 0 13
2 mike New York 21 175 87 19 6 21 46
5 Andy Boston 31 177 85 19 0 1 20