I have the following data
country | goodCode | value |
---|---|---|
CN | 27000400 | 12 000 000 |
KZ | 15000000 | 23 000 000 |
... | ... | ... |
overall there are over 150 different countries and over 12000 different goods (i.e. distinct goodCode)
How I can generate the following table :
country | goodCode | Totalvalue |
---|---|---|
CN | 27000400 | 400 000 000 |
15000000 | 100 000 000 | |
... | ... | ... |
KZ | 15000000 | 244 000 000 |
27000400 | 130 000 000 |
Basically I want to group entire dataset by country and then by goodCode and then display it as for each country top good (i.e. goodCode) and total sum of this good.
When i applied df.groupby(["country" ,"goodCode"]).sum().reset_index()
i had the following table
country | goodCode | Totalvalue |
---|---|---|
CN | 27000400 | 400 000 000 |
CN | 15000000 | 100 000 000 |
CN | 45000000 | 700 000 000 |
CN | 65000000 | 1 700 000 000 |
... | ... | ... |
KZ | 15000000 | 244 000 000 |
KZ | 27000400 | 130 000 000 |
KZ | 37000400 | 530 000 000 |
It is 1. unsorted 2. Country name in each row.
When I applied sort_values(by=["Totalvalue"], ascending = False)
The DataFrame messed up. Countries mixed but sorted by value. It was something like:
country | goodCode | Totalvalue |
---|---|---|
CN | 27000400 | 1 700 000 000 |
KZ | 15000000 | 700 000 000 |
AN | 45000000 | 200 000 000 |
CN | 65000000 | 100 000 000 |
... | ... | ... |
CA | 15000000 | 50 000 000 |
AE | 27000400 | 25 000 000 |
KZ | 37000400 | 20 000 000 |
Can anyone help?
CodePudding user response:
You can use:
out = (df.groupby(['country', 'goodCode'], as_index=False).sum()
.sort_values(['country', 'Totalvalue'], ascending=[True, False]))
For example:
>>> df
country goodCode Totalvalue
0 CN 27000400 1700000000
1 KZ 15000000 700000000
2 AN 45000000 200000000
3 CN 65000000 100000000
4 CA 15000000 50000000
5 AE 27000400 25000000
6 KZ 37000400 20000000
>>> df.sort_values(['country', 'Totalvalue'], ascending=[True, False])
country goodCode Totalvalue
5 AE 27000400 25000000
2 AN 45000000 200000000
4 CA 15000000 50000000
0 CN 27000400 1700000000
3 CN 65000000 100000000
1 KZ 15000000 700000000
6 KZ 37000400 20000000
CodePudding user response:
The first part is good, after that, you can do sort the values using df.sort_values(by=["country", "Totalvalue"])
which will sort by country first, then within the same country to sort by Totalvalue
df = df.groupby(["country" ,"goodCode"]).sum().reset_index()
df = df.sort_values(by=["country", "Totalvalue"], ascending=[True, False])