I have a dataframe such as
df1
Groups Value1 Value2 List_sp
G1 2 3 [Segment_1_1-300_ __Sp1, Segment_2_301-400_-__Sp2]
G1 4 5 [Segment_3_400-500_ __Sp3, Segment_2_600-700_ __Sp4]
G2 6 7 [Segment_12_800-900_-__Sp1]
G2 8 9 [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]
and df2
Groups sp
G1 Segment_1_1-300_ __Sp1
G1 Segment_2_301-400_-__Sp2
G1 Segment_3_400-500_ __Sp3
G1 Segment_2_600-700_ __Sp4
G2 Segment_12_800-900_-__Sp1
G2 Segment_1_400-500_-__Sp3
G2 Segment_7_600-900_ __Sp1
G2 Segment_15_1000-1900_-__Sp1
And I would like to mege df1
and df2
based on the df1['List_sp']
and df2['sp']
elements.
Here are the expected results :
Expected results:
Groups sp Value1 Value2 List_sp
G1 Segment_1_1-300_ __Sp1 2 3 [Segment_1_1-300_ __Sp1, Segment_2_301-400_-__Sp2]
G1 Segment_2_301-400_-__Sp2 2 3 [Segment_1_1-300_ __Sp1, Segment_2_301-400_-__Sp2]
G1 Segment_3_400-500_ __Sp3 4 5 [Segment_3_400-500_ __Sp3, Segment_2_600-700_ __Sp4]
G1 Segment_2_600-700_ __Sp4 4 5 [Segment_3_400-500_ __Sp3, Segment_2_600-700_ __Sp4]
G1 Segment_12_800-900_-__Sp1 6 7 [Segment_12_800-900_-__Sp1]
G2 Segment_1_400-500_-__Sp3 8 9 [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]
G2 Segment_7_600-900_ __Sp1 8 9 [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]
G2 Segment_15_1000-1900_-__Sp1 8 9 [Segment_1_400-500_-__Sp3, Segment_7_600-900_ __Sp1, Segment_15_1000-1900_-__Sp1]
If it can help here are the two dataframe in dictionnary format :
df1
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G2', 7: 'G2'}, 'sp': {0: 'Segment_1_1-300_ __Sp1', 1: 'Segment_2_301-400_-__Sp2', 2: 'Segment_3_400-500_ __Sp3', 3: 'Segment_2_600-700_ __Sp4', 4: 'Segment_12_800-900_-__Sp1', 5: 'Segment_1_400-500_-__Sp3', 6: 'Segment_7_600-900_ __Sp1', 7: 'Segment_15_1000-1900_-__Sp1'}}
df2
{'Groups;sp': {0: 'G1;Segment_1_1-300_ __Sp1', 1: 'G1;Segment_2_301-400_-__Sp2', 2: 'G1;Segment_3_400-500_ __Sp3', 3: 'G1;Segment_2_600-700_ __Sp4', 4: 'G2;Segment_12_800-900_-__Sp1', 5: 'G2;Segment_1_400-500_-__Sp3', 6: 'G2;Segment_7_600-900_ __Sp1', 7: 'G2;Segment_15_1000-1900_-__Sp1'}}
Thanks a lot for your help and time
CodePudding user response:
You may need to do explode
first then we can merge
df1.List_sp = df1.List_sp.str.strip('[|]').str.split(', ')
# noted above is try to convert the string to list , if your original df already have list type , you can just go start with below
df1['sp'] = df1.List_sp
out = df2.merge(df1.explode('sp'),how='left')
CodePudding user response:
Perhaps this will work for you:
df2['Groups;sp'].str.split(';', expand=True).rename({0: 'Groups', 1: 'sp'}, axis=1).merge(df1, on='Groups')