I have a dataframe of time series data, in which data reporting starts at different times (columns) for different observation units (rows). Prior to first reported datapoint for each unit, the dataframe contains NaN
values, e.g.
0 1 2 3 4 ...
A NaN NaN 4 5 6 ...
B NaN 7 8 NaN 10...
C NaN 2 11 24 17...
I want to replace the leading (left-side) NaN
values with 0
, but only the leading ones (i.e. leaving the internal missing ones as NaN
. So the result on the example above would be:
0 1 2 3 4 ...
A 0 0 4 5 6 ...
B 0 7 8 NaN 10...
C 0 2 11 24 17...
(Note the retained NaN
for row B col 3)
I could iterate through the dataframe row-by-row, identify the first index of a non-NaN
value in each row, and replace everything left of that with 0
. But is there a way to do this as a whole-array operation?
CodePudding user response:
notna
cumsum
by rows, cells with zeros are leading NaN:
df[df.notna().cumsum(1) == 0] = 0
df
0 1 2 3 4
A 0.0 0.0 4 5.0 6
B 0.0 7.0 8 NaN 10
C 0.0 2.0 11 24.0 17
CodePudding user response:
Here is another way using cumprod()
and apply()
s = df.isna().cumprod(axis=1).sum(axis=1)
df.apply(lambda x: x.fillna(0,limit = s.loc[x.name]),axis=1)
Output:
0 1 2 3 4
A 0.0 0.0 4.0 5.0 6.0
B 0.0 7.0 8.0 NaN 10.0
C 0.0 2.0 11.0 24.0 17.0