Any optimized way to make this function work faster?-CodePudding

def getDiff(rows, cols, df1, df2):
for item in zip(rows, cols):
    df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]],
                                                             df2.iloc[item[0], item[1]])
return df1

This function is identifying the differences in two dataframes. This is working fine however, if there are more differences or more records in dataframes this is taking long time.

I am here looking to find any faster way to achieve this?

CodePudding user response：

The function would be bit easier to read with:

def getDiff(rows, cols, df1, df2):
    for r,c in zip(rows, cols):
        df1.iloc[r, c] = '{} --> {}'.format(df1.iloc[r, c], df2.iloc[r, c])
    return df1

or even:

def getDiff(rows, cols, df1, df2):
    for item in zip(rows, cols):
        df1.iloc[item] = '{} --> {}'.format(df1.iloc[item], df2.iloc[item])
    return df1

But none of this changes speed. You don't tell us much about the dataframes, but even if they start as numeric dtypes, by writing this string to df1, you have turned it into an object dtype array.

pandas has some string methods that may be applied to whole series, but I'm not familiar with those.