I have a df of id's and dates. What I'd like to do is set the same date for a 2 day time period. Having trouble writing a function for this. Its like using the equivalent to a SQL OVER PARTITION BY
Input:
d1 = {'id': ['a','a','a','a','b','a','b'], 'datetime': [10/25/2021 0:00,10/26/2021 0:00,11/28/2021 0:00,11/29/2021 0:00,11/29/2021 0:00, 11/30/2021 0:00, 11/30/2021 0:00]}
df1 = pd.DataFrame(d1)
Desired Output:
d3 = {'id': ['a','a','a','a','b','b'], 'datetime': [10/25/2021 0:00,10/25/2021 0:00,11/28/2021 0:00,11/28/2021 0:00, 11/30/2021 0:00,11/29/2021 0:00,11/29/2021 0:00]}
df1 = pd.DataFrame(d3)
So the first step would be to sort by id then datetime. Then the function should evaluate the first value and see if the next is within a 2 day time period and set the date to the first value, and then continue to the next one for that id. Stop when it's more than 2 days, and then just repeat.
CodePudding user response:
Try:
df["datetime"] = df["datetime"].iloc[::2].reindex(df.index).ffill()
>>> df
id datetime
0 a 11/25/2021 0:00
1 a 11/25/2021 0:00
2 a 11/28/2021 0:00
3 b 11/28/2021 0:00
4 a 11/30/2021 0:00
CodePudding user response:
Try this:
from datetime import datetime as dt
oldest = dt.strptime(df1['datetime'][0], "%m/%d/%Y %H:%M")
for t in range(df1['datetime'].shape[0]):
if ((dt.strptime(df1['datetime'][t],"%m/%d/%Y %H:%M") - oldest).days) >1:
oldest = dt.strptime(df1['datetime'][t], "%m/%d/%Y %H:%M")
df1.iloc[t, 1] = oldest