Sorry, I am new with pandas and I'm struggling. Basically, I have two dataset df1:
sites | nb samples |
---|---|
A | 3 |
B | 2 |
C | 1 |
df2:
sites1 | sites2 | nb links |
---|---|---|
A | B | 3 |
A | C | 1 |
What I want to get is output:
sites1 | sites2 | nb links | nb samples sites1 | nb samples sites2 |
---|---|---|---|---|
A | B | 3 | 3 | 2 |
A | C | 1 | 3 | 1 |
I tried several things like:
print(pd.merge(df2,df1 ,left_on='sites1',right_on=['sites']))
But iam far away from the result I want to get.
Many thanks in advance for some helps.
CodePudding user response:
A one-liner using pandas.DataFrame.merge
and .rename()
df_new = df2.merge(df1, left_on='sites1', right_on='sites', how='left').merge(df1, left_on='sites2', right_on='sites', how='left')[['sites1', 'sites2', 'nb links', 'nb samples_x', 'nb samples_y']].rename(columns={'nb samples_x': 'nb samples sites1', 'nb samples_y': 'nb samples sites2'})
[Out]:
sites1 sites2 nb links nb samples sites1 nb samples sites2
0 A B 3 3 2
1 A C 1 3 1
Notes
Let us break down to make it easier to understand:
One starts by merging both dataframes
df_new = df2.merge(df1, left_on='sites1', right_on='sites', how='left').merge(df1, left_on='sites2', right_on='sites', how='left') [Out]: sites1 sites2 nb links sites_x nb samples_x sites_y nb samples_y 0 A B 3 A 3 B 2 1 A C 1 A 3 C 1
Selecting only the columns one wants to consider
df_new = df_new[['sites1', 'sites2', 'nb links', 'nb samples_x', 'nb samples_y']] [Out]: sites1 sites2 nb links nb samples_x nb samples_y 0 A B 3 3 2 1 A C 1 3 1
Renaming the columns
df_new.columns = ['sites1', 'sites2', 'nb links', 'nb samples sites1', 'nb samples sites2'] [Out]: sites1 sites2 nb links nb samples sites1 nb samples sites2 0 A B 3 3 2 1 A C 1 3 1