I have two data frames: one that that has data on GDP of countries:
import pandas as pd
data = {'year': [2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002, 2000, 2001, 2002],
'country': ['France', 'France', 'France', 'Germany', 'Germany', 'Germany', 'US', 'US', 'US', "Canada", "Canada","Canada"],
'GDP': [100, 150, 165, 300, 315, 318, 700, 789, 854, 320,313, 324]
}
df1 = pd.DataFrame(data)
df1
year country GDP
0 2000 France 100
1 2001 France 150
2 2002 France 165
3 2000 Germany 300
4 2001 Germany 315
5 2002 Germany 318
6 2000 US 700
7 2001 US 789
8 2002 US 854
9 2000 Canada 320
10 2001 Canada 313
11 2002 Canada 324
Another that has data on their membership in international organizations. If a country is a member of the organization, in a particular year, it gets "1". Otherwise, it gets "0".
data = {'year': [2000, 2001, 2002, 2000, 2001, 2002],
'ioname': ['EU', 'EU', 'EU', 'NAFTA', 'NAFTA', 'NAFTA'],
'France': [1,1,1,0,0,0],
'Germany': [1,1,1,0,0,0],
'US': [0,0,0,1,1,1],
'Canada': [0,0,0,1,1,1],
}
df2 = pd.DataFrame(data)
df2
year ioname France Germany US Canada
0 2000 EU 1 1 0 0
1 2001 EU 1 1 0 0
2 2002 EU 1 1 0 0
3 2000 NAFTA 0 0 1 1
4 2001 NAFTA 0 0 1 1
5 2002 NAFTA 0 0 1 1
I wish to compute the mean GDP for each organization's members. For example, for the EU we should use the values of Germany and France only. This should be the final outcome:
data = {'year': [2000, 2001, 2002, 2000, 2001, 2002],
'ioname': ['EU', 'EU', 'EU', 'NAFTA', 'NAFTA', 'NAFTA'],
'France': [1,1,1,0,0,0],
'Germany': [1,1,1,0,0,0],
'US': [0,0,0,1,1,1],
'Canada': [0,0,0,1,1,1],
'mean_gdp': [200, 232.5, 241.5, 510, 551, 589]
}
df3 = pd.DataFrame(data)
df3
year ioname France Germany US Canada mean_gdp
0 2000 EU 1 1 0 0 200.0
1 2001 EU 1 1 0 0 232.5
2 2002 EU 1 1 0 0 241.5
3 2000 NAFTA 0 0 1 1 510.0
4 2001 NAFTA 0 0 1 1 551.0
5 2002 NAFTA 0 0 1 1 589.0
How can I do this?
CodePudding user response:
Use DataFrame.pivot
for possible multiple values in df2
with convert both columns to MultiIndex
, replace 0
to NaN
s, so possible create mean
, last add new column by DataFrame.join
:
df = df1.pivot('year','country','GDP')
s = df2.set_index(['year','ioname']).mul(df, level=0).replace(0, np.nan).mean(axis=1)
df = df2.join(s.rename('mean_gdp'), on=['year','ioname'])
print (df)
year ioname France Germany US Canada mean_gdp
0 2000 EU 1 1 0 0 200.0
1 2001 EU 1 1 0 0 232.5
2 2002 EU 1 1 0 0 241.5
3 2000 NAFTA 0 0 1 1 510.0
4 2001 NAFTA 0 0 1 1 551.0
5 2002 NAFTA 0 0 1 1 589.0