Home > database >  Dataframe forward-fill till column-specific last valid index
Dataframe forward-fill till column-specific last valid index

Time:01-16

How do I go from:

[In]:   df = pd.DataFrame({
            'col1': [100, np.nan, np.nan, 100, np.nan, np.nan],
            'col2': [np.nan, 100, np.nan, np.nan, 100, np.nan]
        })
        df

[Out]:        col1    col2
        0      100     NaN
        1      NaN     100
        2      NaN     NaN
        3      100     NaN
        4      NaN     100
        5      NaN     NaN

To:

[Out]:        col1    col2
        0      100     NaN
        1      100     100
        2      100     100
        3      100     100
        4      NaN     100
        5      NaN     NaN

My current approach is a to apply a custom method that works on one column at a time:

[In]:   def ffill_last_valid(s):
            last_valid = s.last_valid_index()
            s = s.ffill()
            s[s.index > last_valid] = np.nan
            return s

        df.apply(ffill_last_valid)

But it seems like an overkill to me. Is there a one-liner that works on the dataframe directly?


Note on accepted answer:

See the accepted answer from mozway below.

I know it's a tiny dataframe but:

enter image description here

CodePudding user response:

You can ffill, then keep only the values before the last stretch of NaN with a combination of where and notna/reversed-cummax:

out = df.ffill().where(df[::-1].notna().cummax())

variant:

out = df.ffill().mask(df[::-1].isna().cummin())

Output:

    col1   col2
0  100.0    NaN
1  100.0  100.0
2  100.0  100.0
3  100.0  100.0
4    NaN  100.0
5    NaN    NaN

interpolate:

In theory, df.interpolate(method='ffill', limit_area='inside') should work, but while both options work as expected separately, for some reason it doesn't when combined (pandas 1.5.2). This works with df.interpolate(method='zero', limit_area='inside'), though.

  • Related