I have a DF with a column A containing locations name (i.e. city Xxx, village Yyy, town Zzz) and another DF1 with a column B containing cleaned locations name (without city, village etc.) and columnt C with corresponding state names. I want to match values in DF column A with DF1 column B and if in column copy col B and Col C to DF example
I have the next code which works but way too slow:
for index in range(len(DF)):
for name in range(len(DF1)):
if data1.loc[name, 'B'] in data.loc[index, 'A']:
data.loc[index, 'B'] = data1.loc[name, 'B']
data.loc[index, 'C'] = data1.loc[name, 'C']
Is there any way to make this faster?
CodePudding user response:
You can do it by making the carthesian product of the two dataframes, then create a check column which says if B is in A and then filter by it:
merged_df = df.merge(df1, how="cross")
merged_df['check'] = merged_df.apply(lambda x: x.B in x.A, axis=1)
df = merged_df[merged_df['check']].drop(["check"], axis=1)
CodePudding user response:
You could just split DF into multiple sections and use multiprocessing or multithreading for each section to run the processes simultaneously. This link might be useful.