Home > Software design >  Improve time of looping comparisons and assignments in pandas or numpy
Improve time of looping comparisons and assignments in pandas or numpy

Time:09-05

I have 2 df:

df1

X Y
a c
b d

df2

Z Q
i f
j h

The number of the rows and columns is undefined.

I need to compare X and Z: when an element in X is equal to an element in Z (suppose a is equal to j), the corrisponding value of a in Y (c) becomes equal to the corrisponding value of j in Q (h).

Like so:

for k in range(0, df1['X']:
   for p in range(0, df2['Z']):
      if df1.iloc[k]['X'] == df2.iloc[p]['Y']:
           df1.at[k,'Y'] = df2.iloc[p]['Q']

Obviously with large dataframes this procedure is unsustainable. Anyone know how to speed everything up? I was reading that numpy offers vectorizations. How could this be done? Thanks for the help!

CodePudding user response:

Simply do:

df1["Y"] = np.where(df1.X == df2.Z, df2.Q, df1.Y)
  • Related