I have three datasets,
df1
name A B C D
0 Jeffrey Ray 0.171130 0.307791 -0.031378 0.366607
1 Crystal Roberts 0.429614 1.674201 0.176098 -1.289623
2 Paul Choute 0.396332 0.879003 1.889926 0.745100
3 Amy Clements 1.452253 0.724694 -0.830645 -0.342138
4 Stanley Coston -0.135439 -0.293293 0.684627 0.538049
df2
name A B C D
0 Jeffrey Ray 0.173430 0.399791 -0.012278 0.88897
1 Sara Flicker 0.635744 1.699001 0.176098 -1.11113
2 Jack Sether 0.323333 0.879003 1.889926 0.983640
3 Amy Clements 1.452253 0.784734 -0.673695 -0.342138
4 Stanley Coston -0.143222 -0.293293 0.683647 0.638479
df3
name A B C D
0 Jeffrey Ray 0.638273 0.687777 0.766666 0.900032
1 Sara Flicker 0.635744 1.699001 0.176098 -1.11113
2 Samantha Runyon 0.323333 -0.879003 1.889926 -0.683640
3 Amy Clements -1.452253 0.784734 -0.673695 -0.342138
4 Stanley Coston -0.143222 -0.282222 0.683647 -0.638479
I want to map name in each name and get the sum D:
Jeffrey Ray 0.366607 0.88897 0.900032
Amy Clements -0.342138 (-0.342138) (-0.342138)
Stanley Coston 0.538049 0.638479-0.638479
I tried pd.merge:
pd.merge(df1, df2, on='name', how='inner').sum().reset_index()
But the output is not what I expected. Any suggestion on how should I change it? Another issue is that it can only merge two datasets each time, is there an efficient way that I can map multiple datasets at the same time without a for loop?
Thanks
CodePudding user response:
How about:
dfs = pd.concat([df1,df2,df3])
out = dfs.groupby('name')['D'].sum()[dfs['name'].value_counts()==3]
Output:
name
Amy Clements -1.026414
Jeffrey Ray 2.155609
Stanley Coston 0.538049
Name: D, dtype: float64
CodePudding user response:
You can concat
only the D
column, using the names as index
. Use dropna
to get rid of the incomplete datasets, and sum
:
dfs = [df1, df2, df3]
(pd.concat([d.set_index('name')['D'] for d in dfs], axis=1)
.dropna()
.sum(1)
)
output:
name
Jeffrey Ray 2.155609
Amy Clements -1.026414
Stanley Coston 0.538049