I have data with timestamps, I want to make it into 1min time series and fill the missing values in rows that are created with the last input. However, also have a limit on the ffill function as well. So, if the next input is missing for too long, leave NaN.
Data:
timestamp pay
2020-10-10 23:32 50
2020-10-11 21:55 80
2020-10-13 23:28 40
Convert to this using df.set_index('timestamp').asfreq('1Min', method='ffill')
,
forward fill the pay column until the next input, but if the next input is more than 24 hours away (1440 rows), only fill up to 1440 rows.
So, 2020-10-11 21:55 80
should only filled with 80 until 2020-10-12 21:55
, then leave NaN until 2020-10-13 23:28 40
.
How can I achieve this?
CodePudding user response:
i think you can use resample and ffill with limit option. Can you try this:
mask = df.set_index('timestamp').sort_index().resample('1Min').ffill(limit=1440)
CodePudding user response:
Based on Clegane's very good answer I would like to add there is no need for sort_index()
and to lead there is an exact day, the limit should be 1339 (1 value 1339 makes the full day (1440)). Therefore:
output = df.set_index('timestamp').resample('1Min').fillna(method='ffill',limit=1339)
Quality Check
To ensure it's correctly working:
output['pay'].value_counts()
Returns:
50.0 1343 #Less than a day, so 100% filled
80.0 1440 #Over a day of range, so topped at 1440
40.0 1
Name: pay, dtype: int64