I have the data as shown in the table. I want to use Python. For all the fruits that exist in the year 2016 and 2017, I want the frequencies of country in 2015 for those fruits.
Country | Fruit | Year |
---|---|---|
Germany | Apple | 2015 |
France | Apple | 2015 |
France | Apple | 2015 |
Spain | Apple | 2015 |
Germany | Banana | 2015 |
France | Banana | 2015 |
France | Apple | 2016 |
Spain | Apple | 2016 |
Germany | Banana | 2016 |
France | Banana | 2016 |
France | Banana | 2017 |
France | Grapes | 2017 |
The final table I want looks like below:
Fruit | Germany | France | Spain |
---|---|---|---|
Apple | 1 | 2 | 1 |
Banana | 1 | 1 | 0 |
Grapes | 0 | 0 | 0 |
CodePudding user response:
Filter by Year and then pivot it with pivot_table
:
(df[df.Year == 2015]
.pivot_table('Year', 'Fruit', 'Country', aggfunc='count')
.reindex(
index=df.Fruit.unique(),
columns=df.Country.unique()
).fillna(0)
.reset_index())
Country Fruit Germany France Spain
0 Apple 1.0 2.0 1.0
1 Banana 1.0 1.0 0.0
2 Grapes 0.0 0.0 0.0
Another option is to use crosstab
and then select 2015 from the result:
(pd.crosstab(df.Fruit, [df.Country, df.Year])
.loc[:, pd.IndexSlice[:, 2015]]
.droplevel(1, 1)
.reset_index())
Country Fruit France Germany Spain
0 Apple 2 1 1
1 Banana 1 1 0
2 Grapes 0 0 0
CodePudding user response:
Try:
df_2015 = df[df['Year'] == 2015]
pd.crosstab(df_2015['Fruit'], df_2015['Country']).reindex(df['Fruit'].unique(), fill_value=0)
Output:
Country France Germany Spain
Fruit
Apple 2 1 1
Banana 1 1 0
Grapes 0 0 0