Home > Software engineering >  A different merge
A different merge

Time:01-24

So I have two tables and thoses are the samples:

df1:

Element Range Family
Ae_aag2/0013F 5-2500 Chuviridae
Ae_aag2/0014F 300-2100 Flaviviridae

df2:

Element Range Family
0012F 30-720 Chuviridae
0013F 23-1200 Chuviridae
0013F 1300-2610 Xinmoviridae

And I need to join the tables in the following logic:

Element_df1 Element_df2 Family_df1 Family_df2
Ae_aag2/0013F "0013F:23-1200,0013F:1300-2610" Chuviridae "Chuviridae,Xinmoviridae"

I need the common rows in the two dataframes of the column (Element) in one line, saving the element of the first and second and also the family of the first and second. If the 3 elements are common, in the two df, it should join the 3 in one single line.

I tried using the merge in pandas, but it gets me two lines, not one as I needed: enter image description here

I searched and didn't find how make exceptions on how to merge the two dataframe. I tried using groupby afterwards but kind make worst :(

Unfortunately I don't have much knowledge on working with pandas. Please be kind I'm new at the subject.

CodePudding user response:

Use:

df1.drop(columns='Range').merge(
    df2.assign(group=lambda d: d['Element'],
               Element=lambda d: d['Element'] ':' d['Range'])
       .groupby('group')[['Element', 'Family']].agg(','.join),
    left_on=df1['Element'].str.extract('/(.*)$', expand=False),
    right_index=True, suffixes=('_df1', '_df2')  
)#.drop(columns='key_0') # uncomment to remove the key

Output:

   key_0    Element_df1  Family_df1                    Element_df2               Family_df2
0  0013F  Ae_aag2/0013F  Chuviridae  0013F:23-1200,0013F:1300-2610  Chuviridae,Xinmoviridae
  • Related