Home > Mobile >  Faster way to fetch rows from one dataframe based on values from another
Faster way to fetch rows from one dataframe based on values from another

Time:01-21

I have one dataframe (df_lookup) which has 2 columns, another dataframe (df_master) that has over 100 columns.

I want to be able to pull all rows from df_master into a third dataframe if BOTH the values of two columns present in df_lookup match in df_master.

I am currently using the below brute force way of doing this:

for index, row in df_lookup.iterrows():    
    for index1, row1 in df_master.iterrows():
        if ((row['col 1'] == row1['col1']) and (row['col 2'] == row1['col2'])):
            df_new.loc[len(df_new)] = row1

I am hoping if there is a faster way of looking up since df_master is huge.

CodePudding user response:

the first way is using from merge method:

df = df_lookup.merge(df_master, how= 'inner', on=['col1', 'col2'])

CodePudding user response:

You can use merge:

df_new = df_lookup.merge(df_master,left_on=['col 1', 'col 2'],right_on=['col1','col2'])

CodePudding user response:

if col1 and col2 are strings or numbers or dates or each type that can convert to string format, you can use from the below code and it is faster than merge method:

df = df_master[(df_master.col1.map(str) df_master.col2.map(str)).isin((df_lookup.col1.map(str) df_lookup.col2.map(str)).unique())]

if your columns are strings, delete .map method and your code can run very fast

  • Related