Be the following python pandas DataFrame.
| date | days | country |
| ------------- | ---------- | --------- |
| 2022-02-01 | 1 | Spain |
| 2022-02-02 | 2 | Spain |
| 2022-02-01 | 3 | Italy |
| 2022-02-03 | 2 | France |
| 2022-02-03 | 1 | Germany |
| 2022-02-04 | 1 | Italy |
| 2022-02-04 | 1 | UK |
| 2022-02-05 | 2 | UK |
| 2022-02-04 | 5 | Spain |
| 2022-02-04 | 1 | Portugal |
I want to get a ranking by country according to its number of days.
| country | count_days |
| ---------------- | ----------- |
| Spain | 8 |
| Italy | 4 |
| UK | 3 |
| France | 2 |
| Germany | 1 |
| Portugal | 1 |
Finally I want to return the countries from most to least number of rows in a string array.
return: countries = ['Spain', 'Italy', 'UK', 'France', 'Germany', 'Portugal']
CodePudding user response:
Firat aggreagte sum
, then sorting values and convert to DataFrame:
df1 = (df.groupby('country')['days']
.sum()
.sort_values(ascending=False)
.reset_index(name='count_days'))
print (df1)
country count_days
0 Spain 8
1 Italy 4
2 UK 3
3 France 2
4 Germany 1
5 Portugal 1
Last convert column to list:
countries = df1['country'].tolist()
Solution without DataFrame
df1:
countries = df.groupby('country')['days'].sum().sort_values(ascending=False).index.tolist()