How to replace values along the row until some condition is met?-CodePudding

Imagine I have a dataframe like this:

df = pd.DataFrame({"ID":["A","B","C","C","D"],
                   "DAY 1":[0, 0, 4, 0, 8],
                   "DAY 2":[3, 0, 4, 1, 2],
                   "DAY 3":[0, 2, 9, 9, 6],
                   "DAY 4":[9, 2, 4, 5, 7]})

df
Out[7]: 
  ID  DAY 1  DAY 2  DAY 3  DAY 4
0  A      0      3      0      9
1  B      0      0      2      2
2  C      4      4      9      4
3  C      0      1      9      5
4  D      8      2      6      7

I would like to iterate over every row and replace all 0 values at the beginning of the row before I see a non-zero value.

The ID column shouldn't be in this condition, only the other columns. And I would like to replace these values by NaN. So the output should be like this:

  ID  DAY 1  DAY 2  DAY 3  DAY 4
0  A    nan      3      0      9
1  B    nan    nan      2      2
2  C      4      4      9      4
3  C    nan      1      9      5
4  D      8      2      6      7

And notice that the 0 value in df.loc[0, "DAY 3"] is still there because it didn't meet the condition, as this condition happens only before df.loc[0, "DAY 2"].

Anyone could help me?

CodePudding user response：

You can use a boolean cummin on a subset of the DataFrame to generate a mask for boolean indexing:

mask = (df.filter(like='DAY').eq(0).cummin(axis=1)
           .reindex(columns=df.columns, fill_value=False)
        )

df[mask] = float('nan')

print(df)

Output:


  ID  DAY 1  DAY 2  DAY 3  DAY 4
0  A    NaN    3.0      0      9
1  B    NaN    NaN      2      2
2  C    4.0    4.0      9      4
3  C    NaN    1.0      9      5
4  D    8.0    2.0      6      7

Intermediate mask:


      ID  DAY 1  DAY 2  DAY 3  DAY 4
0  False   True  False  False  False
1  False   True   True  False  False
2  False  False  False  False  False
3  False   True  False  False  False
4  False  False  False  False  False