I have 2 df:
df1
X | Y |
---|---|
a | c |
b | d |
df2
Z | Q |
---|---|
i | f |
j | h |
The number of the rows and columns is undefined.
I need to compare X and Z: when an element in X is equal to an element in Z (suppose a is equal to j), the corrisponding value of a in Y (c) becomes equal to the corrisponding value of j in Q (h).
Like so:
for k in range(0, df1['X']:
for p in range(0, df2['Z']):
if df1.iloc[k]['X'] == df2.iloc[p]['Y']:
df1.at[k,'Y'] = df2.iloc[p]['Q']
Obviously with large dataframes this procedure is unsustainable. Anyone know how to speed everything up? I was reading that numpy offers vectorizations. How could this be done? Thanks for the help!
CodePudding user response:
Simply do:
df1["Y"] = np.where(df1.X == df2.Z, df2.Q, df1.Y)