How to drop last not NaN value in Dataframe?-CodePudding

a1	a2	a3	Last_Not_NaN_Value
1	NaN	NaN	1
0	0	NaN	0
NaN	5	NaN	5

I've managed so far to get last not NaN value in the row this way:

data.ffill(axis=1).iloc[:, -1]

But, I also need to replace that value with NaN (drop it from the DataFrame)

CodePudding user response：

Create a boolean mask to identify non-nan values, then calculate cumsum along axis=1 then mask the values in original dataframe where cumsum is maximum

m = df.notna()
s = m.cumsum(1)
df.mask(s.eq(s.max(1), axis=0))

    a1  a2  a3
0  NaN NaN NaN
1  0.0 NaN NaN
2  NaN NaN NaN

PS: There is no need to create an intermediate column Last_Not_NaN_Value

CodePudding user response：

one way is to use last_valid_index on each row:

df = df[['a1', 'a2', 'a3']] #just in case
for i, r in df.iterrows():
    df.loc[i, r.last_valid_index()] = np.nan

CodePudding user response：

import pandas as pd

seq = (
    df    # set index and column values by their ordinal numbers
    .set_axis(range(df.shape[0]), axis=0)
    .set_axis(range(df.shape[1]), axis=1)
    .agg(pd.DataFrame.last_valid_index, 1)
)
df.values[seq.index, seq] = pd.NA

Here

df is a given data frame;
seq - associate rows with a corresponding last valid column number;
df.values is a numpy.array and it's a view to the values of df
values[seq.index, seq] is Integer array indexing, which allows selection of arbitrary items in df (it's a view to the original data, so we can use assigning to change those values).