Home > database >  DataFrame replace multiple keys in a column
DataFrame replace multiple keys in a column

Time:11-27

Given 2 DataFrames :

DF1 (a Serie) :

Col_Name_1
[['A, B']
['B']
['A, C']
['B, C']]

DF2 :

Col_Name_X, Col_Name_Y
[['A', 'Paris'] 
['B', 'London']
['C', 'Mexico']]

In DF1, I need to replace all values A, B, C, using DF2 which is corresponding table. Critical point is that in some rows, there are multiple values to replace : ['A, B'] for instance in first row.

I tried 3 ways : 1- split Col_Name_1 to have each value in a separate column and loop with a merge fonction 2- create a dictionnary from DF2 and use replace function on DF1 3- create a dictionnary from DF2 and use map function on DF2

Issue I have is that my DataFrames have 1 000 000 rows. 1 & 2 are consumming too much memory, CPU and time. I cannot get a result. 3 is fast but does not match multiple values from DF1 : A, B return NaN

I assume that it must exist a simple way to do this... Any idea please ?

Thanks

CodePudding user response:

Use mapping by dictionary with splitted values in column in df1 with dict.get, if no match return same value:

d = df2.set_index('Col_Name_X')['Col_Name_Y'].to_dict()

f = lambda x: ', '.join(d.get(y,y) for y in x.split(', '))
df1['Col_Name_1'] = df1['Col_Name_1'].apply(f)

print (df1)
       Col_Name_1
0   Paris, London
1          London
2   Paris, Mexico
3  London, Mexico
  • Related