I have a dataset hotels.csv with columns: doc_id, hotel_name, hotel_url, street, city, state, country, zip, class, price, num_reviews, CLEANLINESS, ROOM, SERVICE, LOCATION, VALUE, COMFORT, overall_ratingsource
And I want to count amount of hotels in every country. How can I do it using awk? I can count amount of hotels for China or USA:
cat /home/data/hotels.csv | awk -F, '$7=="China"{n =1} END {print n}'
But how to do it for every country?
CodePudding user response:
Parsing CSV with awk is usually not a good idea. If some of your fields contain commas, for instance, it will not work as expected. Anyway, associative arrays are usually convenient for this kind of tasks:
awk -F, '{num[$7] } END{for(country in num) print country, num[country]}' /home/data/hotels.csv
Note: cat file | awk ...
is useless. Simply pass the file to awk.
CodePudding user response:
If you have the columns as the first row, you can start processing the data starting from the second row, use the name of the country as the array key and increment the value when encountering the same key.
awk -F, 'NR > 1 {
ary[$7]
}
END {
for(item in ary) print item, ary[item]
}
' /home/data/hotels.csv