I have this df which is a sample from the actual one. Each row represent a measurement taken with intervals of approximately 20 minutes and several other columns.
datetime value
261 2022-10-21 02:00:13 0.164299
262 2022-10-21 02:20:12 0.163009
263 2022-10-21 02:40:14 0.162298
264 2022-10-21 03:00:12 0.162846
265 2022-10-24 08:46:38 1.788360
266 2022-10-24 09:06:47 1.442915
267 2022-10-24 09:27:28 1.246774
268 2022-10-24 09:46:58 0.194352
As you can see, there's a gap between 2022-10-21 03:00:12
and 2022-10-24 08:46:38
. I need a way of first identify where the gaps are (there might be n of them), then fill those rows with the proper dates and nan's for the corresponding values.
Right now I'm generating the missing dates as follows:
# number of periods of 20 minutes between dates
periods = (df.iloc[265,0] - df.iloc[264,0]).total_seconds() / 60.0 / 20
# dates to fill gap
pd.date_range(df.iloc[264,0], df.iloc[265,0], periods=periods)
Any idea of how to do this?
Edit: Expected output (I don't need the exact same minutes I added on the new rows, they are just for reference of the overall idea)
datetime value
261 2022-10-21 02:00:13 0.164299
262 2022-10-21 02:20:12 0.163009
263 2022-10-21 02:40:14 0.162298
264 2022-10-21 03:00:12 0.162846
265 2022-10-21 03:20:00 NaN
266 2022-10-21 03:40:00 NaN
...
n-4 2022-10-24 08:26:00 NaN
n-3 2022-10-24 08:46:38 1.788360
n-2 2022-10-24 09:06:47 1.442915
n-1 2022-10-24 09:27:28 1.246774
n 2022-10-24 09:46:58 0.194352
CodePudding user response:
Try (maybe not the cleanest solution - it will try to find gaps > 40 minutes and then add 20 minute intervals with NaN
values):
mask = df["datetime"].diff() > "40 minutes"
dfs = []
for idx, row in df[mask].iterrows():
prev_time = df.loc[idx - 1, "datetime"]
dr = pd.date_range(
prev_time pd.Timedelta("20 minutes"),
row["datetime"] - pd.Timedelta("20 minutes"),
freq="20min",
)
dfs.append(
pd.DataFrame(
[{"datetime": dr}],
index=[idx - 1],
)
)
df = pd.concat([df, *dfs]).sort_index().explode("datetime")
print(df)
Prints:
datetime value
261 2022-10-21 02:00:13 0.164299
262 2022-10-21 02:20:12 0.163009
263 2022-10-21 02:40:14 0.162298
264 2022-10-21 03:00:12 0.162846
264 2022-10-21 03:20:12 NaN
264 2022-10-21 03:40:12 NaN
...
264 2022-10-24 08:00:12 NaN
264 2022-10-24 08:20:12 NaN
265 2022-10-24 08:46:38 1.788360
266 2022-10-24 09:06:47 1.442915
267 2022-10-24 09:27:28 1.246774
268 2022-10-24 09:46:58 0.194352