Change time series frequency, ffill values until the next input but with a limit-CodePudding

I have data with timestamps, I want to make it into 1min time series and fill the missing values in rows that are created with the last input. However, also have a limit on the ffill function as well. So, if the next input is missing for too long, leave NaN.

Data:

timestamp           pay
2020-10-10 23:32    50
2020-10-11 21:55    80
2020-10-13 23:28    40

Convert to this using df.set_index('timestamp').asfreq('1Min', method='ffill'), forward fill the pay column until the next input, but if the next input is more than 24 hours away (1440 rows), only fill up to 1440 rows.

So, 2020-10-11 21:55 80 should only filled with 80 until 2020-10-12 21:55 , then leave NaN until 2020-10-13 23:28 40.

How can I achieve this?

CodePudding user response：

i think you can use resample and ffill with limit option. Can you try this:

mask = df.set_index('timestamp').sort_index().resample('1Min').ffill(limit=1440)

CodePudding user response：

Based on Clegane's very good answer I would like to add there is no need for sort_index() and to lead there is an exact day, the limit should be 1339 (1 value 1339 makes the full day (1440)). Therefore:

output = df.set_index('timestamp').resample('1Min').fillna(method='ffill',limit=1339)

Quality Check

To ensure it's correctly working:

output['pay'].value_counts()

Returns:

50.0    1343 #Less than a day, so 100% filled
80.0    1440 #Over a day of range, so topped at 1440
40.0       1
Name: pay, dtype: int64