This is my dataframe:
df = pd.DataFrame(
{
'a': [np.nan, np.nan, np.nan, 3333, np.nan, np.nan, 10, np.nan, np.nan, np.nan, np.nan, 200, 100],
'b': [np.nan, 20, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 100, np.nan, np.nan, np.nan, np.nan]
}
)
And this is the output that I want:
a b
0 NaN NaN
1 NaN 20.0
2 NaN NaN
3 3333.0 NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN 100.0
9 NaN NaN
10 NaN NaN
11 200.0 NaN
12 NaN NaN
Basically if a value in column 'b' is not NaN, I want to keep one value in column a
. And then make the rest of values in column a
NaN until a value in column b
is not NaN.
For example the first case is 20 in column b
. After that I want to keep 3333 because this is one value below it which is not NaN and I want to replace 10 with NaN because I've already got one value below b
which in this case is 3333 and it is not NaN. The same applies for 100 in column b
.
I've searched many posts on stackoverflow and also tried a couple of lines but it didn't work. I guess maybe it can be done by fillna
.
CodePudding user response:
One approach
a_notna = df['a'].notna()
m = (a_notna.groupby(df['b'].notna().cumsum())
.cumsum()
.eq(1) & a_notna)
df['a'] = df['a'].where(m)
print(df)
a b
0 NaN NaN
1 NaN 20.0
2 NaN NaN
3 3333.0 NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN 100.0
9 NaN NaN
10 NaN NaN
11 200.0 NaN
12 NaN NaN