Given a dataframe df
as follows:
date value 20211003 20211010 20211017
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3510.221000 NaN NaN
4 2021-10-17 3572.3662 3465.737012 3534.220800 NaN
5 2021-10-24 3582.6036 3479.107035 3539.856801 3514.420400
6 2021-10-31 3547.3361 3421.161235 3481.911001 3456.474600
7 2021-11-7 3491.5677 3370.140147 3439.284539 3416.621024
8 2021-11-14 3539.1002 3319.289523 3391.930037 3370.079953
9 2021-11-21 3560.3734 3261.343723 3333.984237 3312.134153
10 2021-11-28 3564.0894 3255.328902 3338.967086 3305.054247
11 2021-12-5 3607.4320 3313.274702 3396.912886 3363.000047
12 2021-12-12 3666.3479 3371.220502 3450.172564 3412.234440
13 2021-12-19 3632.3638 NaN 3466.930383 3428.683490
14 2021-12-26 3618.0535 NaN NaN 3370.737690
Let's say the columns after value
column (20211003
, 20211010
and 20211017
) are rolling forecast result of value
, instead of 10 values for each column, I'll need to keep 3 values only and get the result as follows:
date value 20211003 20211010 20211017
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3510.221000 NaN NaN
4 2021-10-17 3572.3662 3465.737012 3534.220800 NaN
5 2021-10-24 3582.6036 3479.107035 3539.856801 3514.420400
6 2021-10-31 3547.3361 NaN 3481.911001 3456.474600
7 2021-11-7 3491.5677 NaN NaN 3416.621024
How could I achieve that in Pandas? Thanks.
CodePudding user response:
Assuming there is a single stretch per column, use a cumcount
of the non-NA values to determine which ones to keep, then slice and reindex each column:
cols = ['20211003', '20211010', '20211017']
N = 3
cc = df[cols].apply(lambda s: s.groupby((m:=s.notna())).cumcount().where(m))
m1 = ~cc.min(1).gt(N)
masks = ~cc.ge(N)
out = df.apply(lambda s: s[masks[s.name] if s.name in masks else m1]
.reset_index(drop=True))
output:
date value 20211003 20211010 20211017
0 2021-9-19 3613.9663 NaN NaN NaN
1 2021-9-26 3613.0673 NaN NaN NaN
2 2021-10-3 3568.1668 NaN NaN NaN
3 2021-10-10 3592.1666 3510.221000 NaN NaN
4 2021-10-17 3572.3662 3465.737012 3534.220800 NaN
5 2021-10-24 3582.6036 3479.107035 3539.856801 3514.420400
6 2021-10-31 3547.3361 NaN 3481.911001 3456.474600
7 2021-11-7 3491.5677 NaN NaN 3416.621024
8 2021-11-14 3539.1002 NaN NaN NaN