I am trying to figure out how to set all second occurrences of 0 in my dataframe to NaN in python. However, this is a tranposed dataframe so 0 would occur across the columns. To explain, I have the following dataframe:
A B C D E F G
-------------------------------
0 55 34 30 29 15 0 0
1 63 59 49 32 21 16 0
2 42 24 12 5 0 0 0
3 25 6 0 0 0 0 0
4 13 0 0 0 0 0 0
And so rather than calculated occurrences of 0 within a column, I want to count occurrences of 0 across each row (apologies if I did not word this corresctly, and if I should not have called this "transposed").
And so, for each row in this dataframe, starting from column "A" and going to column "G" in order, I want to see each occurrence of 0 after the first occurence to NaN. And so I want to produce the following output dataframe:
A B C D E F G
--------------------------------------------
0 55 34 30 29 15 0 NaN
1 63 59 49 32 21 16 0
2 42 24 12 5 0 NaN NaN
3 25 6 0 NaN NaN NaN NaN
4 13 0 NaN NaN NaN NaN NaN
How can I set all of those 0s after the first occurrence of 0 across each row to NaN? I know it is more straightforward to simply set all 0 to NaN, but I cannot figure out a simple way to tell python I want to separate all occurrences of 0 after the first to be set to NaN.
To explain the reasoning for why I would want to do this, I am trying to plot each row in my dataframe across the letters (as the x-axis). And so having all those 0s in a row will lead to unnecessary points in the plot, as I only am interested in the point where the curve hits 0. A trail of 0s will lead to a differently shaped curve than only containing a single instance of 0. And so I want to only plot each row until the initial point at which 0 is hit (a decay curve).
CodePudding user response:
Compute the cumulative counts of zeros over the rows using df.eq(0).cumsum(axis=1)
and then fill the values with zero count greater than 1 using DataFrame.mask
res = df.mask(df.eq(0).cumsum(axis=1).gt(1))
Output:
>>> res
A B C D E F G
0 55 34 30.0 29.0 15.0 0.0 NaN
1 63 59 49.0 32.0 21.0 16.0 0.0
2 42 24 12.0 5.0 0.0 NaN NaN
3 25 6 0.0 NaN NaN NaN NaN
4 13 0 NaN NaN NaN NaN NaN
Intermediate Outputs:
>>> df.eq(0).cumsum(axis=1)
A B C D E F G
0 0 0 0 0 0 1 2
1 0 0 0 0 0 0 1
2 0 0 0 0 1 2 3
3 0 0 1 2 3 4 5
4 0 1 2 3 4 5 6
>>> df.eq(0).cumsum(axis=1).gt(1)
A B C D E F G
0 False False False False False False True
1 False False False False False False False
2 False False False False False True True
3 False False False True True True True
4 False False True True True True True