So I have two tables and thoses are the samples:
df1:
Element | Range | Family |
---|---|---|
Ae_aag2/0013F | 5-2500 | Chuviridae |
Ae_aag2/0014F | 300-2100 | Flaviviridae |
df2:
Element | Range | Family |
---|---|---|
0012F | 30-720 | Chuviridae |
0013F | 23-1200 | Chuviridae |
0013F | 1300-2610 | Xinmoviridae |
And I need to join the tables in the following logic:
Element_df1 | Element_df2 | Family_df1 | Family_df2 |
---|---|---|---|
Ae_aag2/0013F | "0013F:23-1200,0013F:1300-2610" | Chuviridae | "Chuviridae,Xinmoviridae" |
I need the common rows in the two dataframes of the column (Element) in one line, saving the element of the first and second and also the family of the first and second. If the 3 elements are common, in the two df, it should join the 3 in one single line.
I tried using the merge
in pandas, but it gets me two lines, not one as I needed:
I searched and didn't find how make exceptions on how to merge the two dataframe. I tried using groupby
afterwards but kind make worst :(
Unfortunately I don't have much knowledge on working with pandas. Please be kind I'm new at the subject.
CodePudding user response:
Use:
df1.drop(columns='Range').merge(
df2.assign(group=lambda d: d['Element'],
Element=lambda d: d['Element'] ':' d['Range'])
.groupby('group')[['Element', 'Family']].agg(','.join),
left_on=df1['Element'].str.extract('/(.*)$', expand=False),
right_index=True, suffixes=('_df1', '_df2')
)#.drop(columns='key_0') # uncomment to remove the key
Output:
key_0 Element_df1 Family_df1 Element_df2 Family_df2
0 0013F Ae_aag2/0013F Chuviridae 0013F:23-1200,0013F:1300-2610 Chuviridae,Xinmoviridae