Home > database >  How to set second and following occurrences of 0 to NaN in python
How to set second and following occurrences of 0 to NaN in python

Time:06-29

I am trying to figure out how to set all second occurrences of 0 in my dataframe to NaN in python. However, this is a tranposed dataframe so 0 would occur across the columns. To explain, I have the following dataframe:

     A   B   C   D   E   F   G
-------------------------------
0   55  34  30  29  15    0  0
1   63  59  49  32  21   16  0   
2   42  24  12   5   0    0  0
3   25  6    0   0   0    0  0
4   13  0    0   0   0    0  0

And so rather than calculated occurrences of 0 within a column, I want to count occurrences of 0 across each row (apologies if I did not word this corresctly, and if I should not have called this "transposed").

And so, for each row in this dataframe, starting from column "A" and going to column "G" in order, I want to see each occurrence of 0 after the first occurence to NaN. And so I want to produce the following output dataframe:

     A     B     C      D     E     F     G
--------------------------------------------
0   55    34     30    29    15     0   NaN
1   63    59     49    32    21    16     0   
2   42    24     12     5     0   NaN   NaN
3   25     6      0   NaN   NaN   NaN   NaN
4   13     0    NaN   NaN   NaN   NaN   NaN

How can I set all of those 0s after the first occurrence of 0 across each row to NaN? I know it is more straightforward to simply set all 0 to NaN, but I cannot figure out a simple way to tell python I want to separate all occurrences of 0 after the first to be set to NaN.

To explain the reasoning for why I would want to do this, I am trying to plot each row in my dataframe across the letters (as the x-axis). And so having all those 0s in a row will lead to unnecessary points in the plot, as I only am interested in the point where the curve hits 0. A trail of 0s will lead to a differently shaped curve than only containing a single instance of 0. And so I want to only plot each row until the initial point at which 0 is hit (a decay curve).

CodePudding user response:

Compute the cumulative counts of zeros over the rows using df.eq(0).cumsum(axis=1) and then fill the values with zero count greater than 1 using DataFrame.mask

res = df.mask(df.eq(0).cumsum(axis=1).gt(1))

Output:

>>> res

    A   B     C     D     E     F    G
0  55  34  30.0  29.0  15.0   0.0  NaN
1  63  59  49.0  32.0  21.0  16.0  0.0
2  42  24  12.0   5.0   0.0   NaN  NaN
3  25   6   0.0   NaN   NaN   NaN  NaN
4  13   0   NaN   NaN   NaN   NaN  NaN

Intermediate Outputs:

>>> df.eq(0).cumsum(axis=1) 

   A  B  C  D  E  F  G
0  0  0  0  0  0  1  2
1  0  0  0  0  0  0  1
2  0  0  0  0  1  2  3
3  0  0  1  2  3  4  5
4  0  1  2  3  4  5  6

>>> df.eq(0).cumsum(axis=1).gt(1)

       A      B      C      D      E      F      G
0  False  False  False  False  False  False   True
1  False  False  False  False  False  False  False
2  False  False  False  False  False   True   True
3  False  False  False   True   True   True   True
4  False  False   True   True   True   True   True
  • Related