Home > Mobile >  Replace only leading NaN values in Pandas dataframe
Replace only leading NaN values in Pandas dataframe

Time:02-14

I have a dataframe of time series data, in which data reporting starts at different times (columns) for different observation units (rows). Prior to first reported datapoint for each unit, the dataframe contains NaN values, e.g.

    0   1   2   3   4 ...
A   NaN NaN 4   5   6 ...
B   NaN 7   8   NaN 10...
C   NaN 2   11  24  17...

I want to replace the leading (left-side) NaN values with 0, but only the leading ones (i.e. leaving the internal missing ones as NaN. So the result on the example above would be:

    0   1   2   3   4 ...
A   0   0   4   5   6 ...
B   0   7   8   NaN 10...
C   0   2   11  24  17...

(Note the retained NaN for row B col 3)

I could iterate through the dataframe row-by-row, identify the first index of a non-NaN value in each row, and replace everything left of that with 0. But is there a way to do this as a whole-array operation?

CodePudding user response:

notna cumsum by rows, cells with zeros are leading NaN:

df[df.notna().cumsum(1) == 0] = 0

df
     0    1   2     3   4
A  0.0  0.0   4   5.0   6
B  0.0  7.0   8   NaN  10
C  0.0  2.0  11  24.0  17

CodePudding user response:

Here is another way using cumprod() and apply()

s = df.isna().cumprod(axis=1).sum(axis=1)
df.apply(lambda x: x.fillna(0,limit = s.loc[x.name]),axis=1)

Output:

     0    1     2     3     4
A  0.0  0.0   4.0   5.0   6.0
B  0.0  7.0   8.0   NaN  10.0
C  0.0  2.0  11.0  24.0  17.0
  • Related