import pandas as pd
## the following is my dataset
gm = pd.read_csv('https://raw.githubusercontent.com/gerberl/6G7V0026-2223/main/datasets/gapminder.tsv', sep='\t')
gm.iloc[8:15]
The data frame looks something like this. So I am trying extract the gdp per capita of europe over the years. I am using the following syntax:
gm.groupby(['country','year'])["gdpPercap"].sum()
How can i only sum for countries in Europe?
CodePudding user response:
First Filter then, Group
import pandas as pd
gm = pd.read_csv('https://raw.githubusercontent.com/gerberl/6G7V0026-2223/main/datasets/gapminder.tsv', sep='\t')
df_Europe= gm[gm['continent'].str.contains("Europe")]
df_Europe =df_Europe.groupby(['country','year'])["gdpPercap"].sum()
print(df_Europe)
#ouput
country year
Albania 1952 1601.056136
1957 1942.284244
1962 2312.888958
1967 2760.196931
1972 3313.422188
...
United Kingdom 1987 21664.787670
1992 22705.092540
1997 26074.531360
2002 29478.999190
2007 33203.261280
CodePudding user response:
First select European countries, group by country and summarize:
gm.loc[gm['continent'] == 'Europe'].groupby(['country'])['gdpPercap'].sum()
country
Albania 39064.399592
Austria 244942.995352
Belgium 238809.096860
Bosnia and Herzegovina 41817.348833
Bulgaria 76608.662064
Croatia 111980.548151
Czech Republic 167040.136548
Denmark 260061.898655
Finland 209684.672008
France 226002.843925
Germany 246680.213193
Greece 167628.441995
Hungary 130658.107844
Iceland 246377.067270
Ireland 189103.274853
Italy 194942.508077
Montenegro 86496.774717
Netherlands 260986.226498
Norway 320967.678650
Poland 100998.646947
Portugal 136249.103129
Romania 87602.039683
Serbia 111660.593329
Slovak Republic 124986.368268
Slovenia 168894.985312
Spain 168357.917745
Sweden 239317.513248
Switzerland 324892.012860
Turkey 53633.440556
United Kingdom 232565.675827
Name: gdpPercap, dtype: float64