Home > database >  Merge two dataframe based on a list element matches in pandas
Merge two dataframe based on a list element matches in pandas

Time:11-09

I have a dataframe such as

df1

Groups Value1 Value2 List_sp
G1     2      3      [Segment_1_1-300_ __Sp1, Segment_2_301-400_-__Sp2]
G1     4      5      [Segment_3_400-500_ __Sp3, Segment_2_600-700_ __Sp4]
G2     6      7      [Segment_12_800-900_-__Sp1]
G2     8      9      [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]

and df2

Groups sp 
G1 Segment_1_1-300_ __Sp1
G1 Segment_2_301-400_-__Sp2
G1 Segment_3_400-500_ __Sp3
G1 Segment_2_600-700_ __Sp4
G2 Segment_12_800-900_-__Sp1
G2 Segment_1_400-500_-__Sp3
G2 Segment_7_600-900_ __Sp1
G2 Segment_15_1000-1900_-__Sp1

And I would like to mege df1 and df2 based on the df1['List_sp'] and df2['sp'] elements. Here are the expected results :

Expected results: 

Groups sp                       Value1 Value2 List_sp
G1 Segment_1_1-300_ __Sp1       2      3      [Segment_1_1-300_ __Sp1, Segment_2_301-400_-__Sp2]
G1 Segment_2_301-400_-__Sp2     2      3      [Segment_1_1-300_ __Sp1, Segment_2_301-400_-__Sp2]
G1 Segment_3_400-500_ __Sp3     4      5      [Segment_3_400-500_ __Sp3, Segment_2_600-700_ __Sp4]
G1 Segment_2_600-700_ __Sp4     4      5      [Segment_3_400-500_ __Sp3, Segment_2_600-700_ __Sp4]
G1 Segment_12_800-900_-__Sp1    6      7      [Segment_12_800-900_-__Sp1]
G2 Segment_1_400-500_-__Sp3     8      9      [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]
G2 Segment_7_600-900_ __Sp1     8      9      [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]
G2 Segment_15_1000-1900_-__Sp1  8      9      [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]

If it can help here are the two dataframe in dictionnary format :

df1

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G2'}, 'sp': {0: 'Segment_1_1-300_ __Sp1', 1: 'Segment_2_301-400_-__Sp2', 2: 'Segment_3_400-500_ __Sp3', 3: 'Segment_2_600-700_ __Sp4', 4: 'Segment_12_800-900_-__Sp1', 5: 'Segment_1_400-500_-__Sp3', 6: 'Segment_7_600-900_ __Sp1', 7: 'Segment_15_1000-1900_-__Sp1'}}

df2

{'Groups;sp': {0: 'G1;Segment_1_1-300_ __Sp1', 1: 'G1;Segment_2_301-400_-__Sp2', 2: 'G1;Segment_3_400-500_ __Sp3', 3: 'G1;Segment_2_600-700_ __Sp4', 4: 'G2;Segment_12_800-900_-__Sp1', 5: 'G2;Segment_1_400-500_-__Sp3', 6: 'G2;Segment_7_600-900_ __Sp1', 7: 'G2;Segment_15_1000-1900_-__Sp1'}}

Thanks a lot for your help and time

CodePudding user response:

You may need to do explode first then we can merge

df1.List_sp = df1.List_sp.str.strip('[|]').str.split(', ')
# noted above is try to convert the string to list , if your original df already have list type , you can just go start with below 
df1['sp'] = df1.List_sp
out = df2.merge(df1.explode('sp'),how='left')

CodePudding user response:

Perhaps this will work for you:

df2['Groups;sp'].str.split(';', expand=True).rename({0: 'Groups', 1: 'sp'}, axis=1).merge(df1, on='Groups')
  • Related