Imagine I have a dataframe like this:
df = pd.DataFrame({"ID":["A","B","C","C","D"],
"DAY 1":[0, 0, 4, 0, 8],
"DAY 2":[3, 0, 4, 1, 2],
"DAY 3":[0, 2, 9, 9, 6],
"DAY 4":[9, 2, 4, 5, 7]})
df
Out[7]:
ID DAY 1 DAY 2 DAY 3 DAY 4
0 A 0 3 0 9
1 B 0 0 2 2
2 C 4 4 9 4
3 C 0 1 9 5
4 D 8 2 6 7
I would like to iterate over every row and replace all 0 values at the beginning of the row before I see a non-zero value.
The ID column shouldn't be in this condition, only the other columns. And I would like to replace these values by NaN. So the output should be like this:
ID DAY 1 DAY 2 DAY 3 DAY 4
0 A nan 3 0 9
1 B nan nan 2 2
2 C 4 4 9 4
3 C nan 1 9 5
4 D 8 2 6 7
And notice that the 0 value in df.loc[0, "DAY 3"] is still there because it didn't meet the condition, as this condition happens only before df.loc[0, "DAY 2"].
Anyone could help me?
CodePudding user response:
You can use a boolean cummin
on a subset of the DataFrame to generate a mask for boolean indexing:
mask = (df.filter(like='DAY').eq(0).cummin(axis=1)
.reindex(columns=df.columns, fill_value=False)
)
df[mask] = float('nan')
print(df)
Output:
ID DAY 1 DAY 2 DAY 3 DAY 4
0 A NaN 3.0 0 9
1 B NaN NaN 2 2
2 C 4.0 4.0 9 4
3 C NaN 1.0 9 5
4 D 8.0 2.0 6 7
Intermediate mask
:
ID DAY 1 DAY 2 DAY 3 DAY 4
0 False True False False False
1 False True True False False
2 False False False False False
3 False True False False False
4 False False False False False