Home > OS >  Replacing values with nan based on values of another column
Replacing values with nan based on values of another column

Time:01-21

This is my dataframe:

df = pd.DataFrame(
    {
        'a': [np.nan, np.nan, np.nan, 3333, np.nan,  np.nan, 10, np.nan, np.nan, np.nan, np.nan, 200, 100],
        'b': [np.nan, 20, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 100, np.nan, np.nan, np.nan, np.nan]
    }
)

And this is the output that I want:

         a      b
0      NaN    NaN
1      NaN   20.0
2      NaN    NaN
3   3333.0    NaN
4      NaN    NaN
5      NaN    NaN
6      NaN    NaN
7      NaN    NaN
8      NaN  100.0
9      NaN    NaN
10     NaN    NaN
11   200.0    NaN
12     NaN    NaN

Basically if a value in column 'b' is not NaN, I want to keep one value in column a. And then make the rest of values in column a NaN until a value in column b is not NaN.

For example the first case is 20 in column b. After that I want to keep 3333 because this is one value below it which is not NaN and I want to replace 10 with NaN because I've already got one value below b which in this case is 3333 and it is not NaN. The same applies for 100 in column b.

I've searched many posts on stackoverflow and also tried a couple of lines but it didn't work. I guess maybe it can be done by fillna.

CodePudding user response:

One approach

a_notna = df['a'].notna()
m = (a_notna.groupby(df['b'].notna().cumsum())
            .cumsum()
            .eq(1) & a_notna)
df['a'] = df['a'].where(m)
print(df)

         a      b
0      NaN    NaN
1      NaN   20.0
2      NaN    NaN
3   3333.0    NaN
4      NaN    NaN
5      NaN    NaN
6      NaN    NaN
7      NaN    NaN
8      NaN  100.0
9      NaN    NaN
10     NaN    NaN
11   200.0    NaN
12     NaN    NaN
  • Related