Suppose I have two DataFrames:
df1:
ticker A B C
date
2022-01-01 NaN NaN 100
2022-01-02 NaN 200 NaN
2022-01-03 100 NaN NaN
2022-01-04 NaN NaN 120
df2:
ticker A B C
date
2022-01-02 145 233 100
2022-01-03 231 200 241
2022-01-04 100 200 422
2022-01-05 424 324 222
2022-01-06 400 421 320
I want to fill the values in df2
as np.nan
for each index and column, where the value in df1
is not null to get the following:
df3:
ticker A B C
date
2022-01-02 145 NaN 100
2022-01-03 NaN 200 241
2022-01-04 100 200 NaN
2022-01-05 424 324 222
2022-01-06 400 421 320
I am applying the following code:
for col in df1.columns:
idx = df1[df1[col].notna()].index
if df2[col][idx] == df1[col][idx]:
df2[col][idx] = np.nan
However, this gives the error: ValueError: The truth value of a Series is ambiguous. Use a.empty(), a.bool(), a.item(), a.any() or a.all().
How can I re-write the above loop?
CodePudding user response:
You can use reindex_like
to align df1
with df2
, then mask
the values of df2
for which the matching df1
are notna
:
out = df2.mask(df1.reindex_like(df2).notna())
To modify df2
in place:
df2[df1.reindex_like(df2).notna()] = float('nan')
Output:
A B C
date
2022-01-02 145.0 NaN 100.0
2022-01-03 NaN 200.0 241.0
2022-01-04 100.0 200.0 NaN
2022-01-05 424.0 324.0 222.0
2022-01-06 400.0 421.0 320.0
combining several conditions
df1b = df1.reindex_like(df2)
out = df2.mask(df1b.notna()&df2.ne(df1b), df2-df1b)
Output:
A B C
date
2022-01-02 145 33 100
2022-01-03 131 200 241
2022-01-04 100 200 302
2022-01-05 424 324 222
2022-01-06 400 421 320