Home > database >  How can I remove values in one DataFrame where the matching value in another DataFrame is not null?
How can I remove values in one DataFrame where the matching value in another DataFrame is not null?

Time:10-27

Suppose I have two DataFrames:

df1:
ticker           A        B        C
date
2022-01-01       NaN      NaN      100
2022-01-02       NaN      200      NaN
2022-01-03       100      NaN      NaN
2022-01-04       NaN      NaN      120

df2:
ticker           A        B        C
date
2022-01-02       145      233      100
2022-01-03       231      200      241
2022-01-04       100      200      422
2022-01-05       424      324      222
2022-01-06       400      421      320

I want to fill the values in df2 as np.nan for each index and column, where the value in df1 is not null to get the following:

df3:
ticker           A        B        C
date
2022-01-02       145      NaN      100
2022-01-03       NaN      200      241
2022-01-04       100      200      NaN
2022-01-05       424      324      222
2022-01-06       400      421      320

I am applying the following code:

for col in df1.columns:
    idx = df1[df1[col].notna()].index
    if df2[col][idx] == df1[col][idx]:
        df2[col][idx] = np.nan

However, this gives the error: ValueError: The truth value of a Series is ambiguous. Use a.empty(), a.bool(), a.item(), a.any() or a.all().

How can I re-write the above loop?

CodePudding user response:

You can use reindex_like to align df1 with df2, then mask the values of df2 for which the matching df1 are notna:

out = df2.mask(df1.reindex_like(df2).notna())

To modify df2 in place:

df2[df1.reindex_like(df2).notna()] = float('nan')

Output:

                A      B      C
date                           
2022-01-02  145.0    NaN  100.0
2022-01-03    NaN  200.0  241.0
2022-01-04  100.0  200.0    NaN
2022-01-05  424.0  324.0  222.0
2022-01-06  400.0  421.0  320.0
combining several conditions
df1b = df1.reindex_like(df2)

out = df2.mask(df1b.notna()&df2.ne(df1b), df2-df1b)

Output:

              A    B    C
date                     
2022-01-02  145   33  100
2022-01-03  131  200  241
2022-01-04  100  200  302
2022-01-05  424  324  222
2022-01-06  400  421  320
  • Related