Home > Blockchain >  Replace elements in a list in a dataframe matching elements in a list in another dataframe with matc
Replace elements in a list in a dataframe matching elements in a list in another dataframe with matc

Time:06-27

I have a pandas dataframe, df1. I have another pandas time frame, df2, with fruits columns that I would like to replace the elements inside the list that are found in the duplicates column in df1 with value of the name column in df1.

df1


 name          duplicates                         
               
0 a.apple      ['b.apple', 'c.apple']              
1 t.orange     ['arr.orange', 'pg.orange']         
2 ts.grape     ['a.grape' , 'test.grape']          
3 u.berryCool  ['X.berryCool', 'cool.berryCool']   

df2


   people     fruits                                           

0  jack       ['b.apple', 'c.apple', 'pp.tomato', 'ao.banana' ]
1  mary       ['arr.orange', 'b.apple', 'X.berryCool', 'op.mango']
2  andy       ['cool.berryCool' , 'test.grape', 'yu.papaya']   
3  lawrence   ['jc.orange', 'c.apple']                

Expected Output

 people      fruits                                                        
                                             
0 jack       ['a.apple', 'a.apple', 'pp.tomato', 'ao.banana' ]     
1 mary       ['t.orange', 'a.apple', 'u.berryCool', 'op.mango']
2 andy       ['u.berryCool' , 'ts.grape', 'yu.papaya']             
3 lawrence   ['t.orange' , 'a.apple']        
        

How can I accomplish this efficiently? Any suggestion is appreciated.

CodePudding user response:

You can create a mapping dict (Series):

MAPPING = df1.explode('duplicates').set_index('duplicates')['name']
df2['fruits'] = (df2.explode('fruits')['fruits'].replace(MAPPING)
                    .groupby(level=0).agg(list))
print(df2)

# Output
     people                                      fruits
0      jack    [a.apple, a.apple, pp.tomato, ao.banana]
1      mary  [t.orange, a.apple, u.berryCool, op.mango]
2      andy          [u.berryCool, ts.grape, yu.papaya]
3  lawrence                        [jc.orange, a.apple]

CodePudding user response:

Create dictionary by flatten values in list in column duplicates first and then mapping values with dict.get - if no match return same value:

d = {x: a for a, b in zip(df1['name'], df1['duplicates']) for x in b}

df2['fruits'] = [[d.get(y,y) for y in x] for x in df2['fruits']]

print (df2)
     people                                      fruits
0      jack    [a.apple, a.apple, pp.tomato, ao.banana]
1      mary  [t.orange, a.apple, u.berryCool, op.mango]
2      andy          [u.berryCool, ts.grape, yu.papaya]
3  lawrence                        [jc.orange, a.apple]

Performance in 4k DataFrame: (depends of data, best test real data)

df2 = pd.concat([df2] * 1000, ignore_index=True)


In [135]: %%timeit
     ...: MAPPING = df1.explode('duplicates').set_index('duplicates')['name']
     ...: df2['fruits1'] = (df2.explode('fruits')['fruits'].replace(MAPPING).groupby(level=0).agg(list))
     ...: 
128 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [136]: %%timeit
     ...: d = {x: a for a, b in zip(df1['name'], df1['duplicates']) for x in b}
     ...: 
     ...: df2['fruits2'] = [[d.get(y,y) for y in x] for x in df2['fruits']]
     ...: 
5.27 ms ± 245 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  • Related