Following on from my previous question filling in the missing times using average for the values pythn , how would I do the same but stop at the end of each day. I have tried grouping but that seems to remove a lot of data. This is the data I start with:
time = np.array([pd.to_datetime("2022-01-01 00:00:00"),pd.to_datetime("2022-01-01 00:00:01"),pd.to_datetime("2022-01-01 00:00:03"), pd.to_datetime("2022-01-01 00:00:04"),pd.to_datetime("2022-01-02 00:00:07"),pd.to_datetime("2022-01-02 00:00:09"), pd.to_datetime("2022-01-02 00:00:10")])
lat = [58.1, 58.4, 58.5, 58.9, 52,52.2, 52.5]
lng = [1.34, 1.44, 1.46, 1.48, 1.35, 1.37, 1.39]
df = pd.DataFrame({"time": time, "lat": lat, "lng" :lng})
time lat lng
2022-01-01 00:00:00 58.1 1.34
2022-01-01 00:00:01 58.4 1.44
2022-01-01 00:00:03 58.5 1.46
2022-01-01 00:00:04 58.9 1.48
2022-01-02 00:00:07 52.0 1.35
2022-01-02 00:00:09 52.2 1.37
2022-01-02 00:00:10 52.5 1.39
and the expected output would be:
time lat lng
2022-01-01 00:00:00 58.1 1.34
2022-01-01 00:00:01 58.4 1.44
2022-01-01 00:00:01 58.45 1.45
2022-01-01 00:00:03 58.5 1.46
2022-01-01 00:00:04 58.9 1.48
2022-01-02 00:00:07 52.0 1.35
2022-01-02 00:00:08 52.1 1.36
2022-01-02 00:00:09 52.2 1.37
2022-01-02 00:00:10 52.5 1.39
Using this:
df = df.set_index('time').asfreq(freq='S').interpolate()
Works perfectly when all my data is from the same day. How would I make it so it resets on the next day?
CodePudding user response:
You can groupby
and use a custom function with apply
to run the relevant interpolation logic:
def func(x):
return x.set_index('time').asfreq(freq='S').interpolate().reset_index()
df.groupby(df['time'].dt.day).apply(func).reset_index(drop=True)
Result:
time lat lng
0 2022-01-01 00:00:00 58.10 1.34
1 2022-01-01 00:00:01 58.40 1.44
2 2022-01-01 00:00:02 58.45 1.45
3 2022-01-01 00:00:03 58.50 1.46
4 2022-01-01 00:00:04 58.90 1.48
5 2022-01-02 00:00:07 52.00 1.35
6 2022-01-02 00:00:08 52.10 1.36
7 2022-01-02 00:00:09 52.20 1.37
8 2022-01-02 00:00:10 52.50 1.39