a1 | a2 | a3 | Last_Not_NaN_Value |
---|---|---|---|
1 | NaN | NaN | 1 |
0 | 0 | NaN | 0 |
NaN | 5 | NaN | 5 |
I've managed so far to get last not NaN value in the row this way:
data.ffill(axis=1).iloc[:, -1]
But, I also need to replace that value with NaN (drop it from the DataFrame)
CodePudding user response:
Create a boolean mask to identify non-nan values, then calculate cumsum
along axis=1
then mask
the values in original dataframe where cumsum
is maximum
m = df.notna()
s = m.cumsum(1)
df.mask(s.eq(s.max(1), axis=0))
a1 a2 a3
0 NaN NaN NaN
1 0.0 NaN NaN
2 NaN NaN NaN
PS: There is no need to create an intermediate column Last_Not_NaN_Value
CodePudding user response:
one way is to use last_valid_index on each row:
df = df[['a1', 'a2', 'a3']] #just in case
for i, r in df.iterrows():
df.loc[i, r.last_valid_index()] = np.nan
CodePudding user response:
import pandas as pd
seq = (
df # set index and column values by their ordinal numbers
.set_axis(range(df.shape[0]), axis=0)
.set_axis(range(df.shape[1]), axis=1)
.agg(pd.DataFrame.last_valid_index, 1)
)
df.values[seq.index, seq] = pd.NA
Here
df
is a given data frame;seq
- associate rows with a corresponding last valid column number;df.values
is anumpy.array
and it's a view to the values ofdf
values[seq.index, seq]
is Integer array indexing, which allows selection of arbitrary items indf
(it's a view to the original data, so we can use assigning to change those values).