def getDiff(rows, cols, df1, df2):
for item in zip(rows, cols):
df1.iloc[item[0], item[1]] = '{} --> {}'.format(df1.iloc[item[0], item[1]],
df2.iloc[item[0], item[1]])
return df1
This function is identifying the differences in two dataframes. This is working fine however, if there are more differences or more records in dataframes this is taking long time.
I am here looking to find any faster way to achieve this?
CodePudding user response:
The function would be bit easier to read with:
def getDiff(rows, cols, df1, df2):
for r,c in zip(rows, cols):
df1.iloc[r, c] = '{} --> {}'.format(df1.iloc[r, c], df2.iloc[r, c])
return df1
or even:
def getDiff(rows, cols, df1, df2):
for item in zip(rows, cols):
df1.iloc[item] = '{} --> {}'.format(df1.iloc[item], df2.iloc[item])
return df1
But none of this changes speed. You don't tell us much about the dataframes, but even if they start as numeric dtypes, by writing this string to df1
, you have turned it into an object
dtype array.
pandas
has some string methods that may be applied to whole series, but I'm not familiar with those.