I have a dataframe like this:
import pandas as pd
import numpy as np
data = [['A', 0, 0, 0, 0], ['B', 0, 'X', 0, 0], ['C', 'X', 0, 0, 0], ['D', 0, 0, 0, 'X'], ['E', 0, 0, 'X', 0]]
df = pd.DataFrame(data, columns=['GROUP', 'V1', 'V2', 'V3', 'V4'])
GROUP V1 V2 V3 V4
0 A 0 0 0 0
1 B 0 X 0 0
2 C X 0 0 0
3 D 0 0 0 X
4 E 0 0 X 0
I would like to convert all values after the X
to NaN
row by row. Here is the expected output:
data = [['A', 0, 0, 0, 0], ['B', 0, 'X', np.NaN, np.NaN], ['C', 'X', np.NaN, np.NaN, np.NaN], ['D', 0, 0, 0, 'X'], ['E', 0, 0, 'X', np.NaN]]
df_desired = pd.DataFrame(data, columns=['GROUP', 'V1', 'V2', 'V3', 'V4'])
GROUP V1 V2 V3 V4
0 A 0 0 0 0
1 B 0 X NaN NaN
2 C X NaN NaN NaN
3 D 0 0 0 X
4 E 0 0 X NaN
So I was wondering if it is possible to replace these values after the X
using pandas
?
CodePudding user response:
In [90]: df.mask(df.eq("X").cummax(axis=1).shift(1, axis=1, fill_value=False))
Out[90]:
GROUP V1 V2 V3 V4
0 A 0 0 0 0
1 B 0 X NaN NaN
2 C X NaN NaN NaN
3 D 0 0 0 X
4 E 0 0 X NaN
- check if equal to X frame-wise; gives True/False frame
- take the cumulative maximum row-wise; once True, always True
- shift by 1 to not include X's themselves (fill_value=False for the gap in leftest pos after shift)
- mask where True, i.e., put NaN where it's after X
CodePudding user response:
Let us do
df.where(df.ne('X').cumprod(1).shift(axis=1).ne(0))
Out[27]:
GROUP V1 V2 V3 V4
0 A 0 0 0 0
1 B 0 X NaN NaN
2 C X NaN NaN NaN
3 D 0 0 0 X
4 E 0 0 X NaN