How to make nested for loops work faster?-CodePudding

I have a DF with a column A containing locations name (i.e. city Xxx, village Yyy, town Zzz) and another DF1 with a column B containing cleaned locations name (without city, village etc.) and columnt C with corresponding state names. I want to match values in DF column A with DF1 column B and if in column copy col B and Col C to DF example

I have the next code which works but way too slow:

for index in range(len(DF)):
    for name in range(len(DF1)):
        if data1.loc[name, 'B'] in data.loc[index, 'A']:
            data.loc[index, 'B'] = data1.loc[name, 'B']
            data.loc[index, 'C'] = data1.loc[name, 'C']

Is there any way to make this faster?

CodePudding user response：

You can do it by making the carthesian product of the two dataframes, then create a check column which says if B is in A and then filter by it:

merged_df = df.merge(df1, how="cross")
merged_df['check'] = merged_df.apply(lambda x: x.B in x.A, axis=1)
df = merged_df[merged_df['check']].drop(["check"], axis=1)

CodePudding user response：

You could just split DF into multiple sections and use multiprocessing or multithreading for each section to run the processes simultaneously. This link might be useful.