I have the following table and I need to re-calculate the StartDate
based on some business rules that I was given. Specifically, when the StartDate
of the succeeding session is within one calendar day (i.e., < 2 days) the sessions are combined, using the earlier StartDate
and the later EndDate
.
StartDate | EndDate | DaysBetweenSessions |
---|---|---|
2011-04-15 | 2011-04-21 | 3 |
2011-04-24 | 2011-04-27 | 3 |
2011-04-30 | 2011-05-01 | 0 |
2011-05-01 | 2011-05-06 | 5 |
2011-05-11 | 2011-05-15 | 1 |
2011-05-16 | 2011-05-17 | 0 |
2011-05-17 | 2011-05-25 | 10 |
What I need as a result:
StartDate | EndDate | DaysBetweenSessions | NewStartDate | NewEndDate |
---|---|---|---|---|
2011-04-15 | 2011-04-21 | 3 | 2011-04-15 | 2011-04-21 |
2011-04-24 | 2011-04-27 | 3 | 2011-04-24 | 2011-04-27 |
2011-04-30 | 2011-05-01 | 0 | 2011-04-30 | 2011-05-06 |
2011-05-01 | 2011-05-06 | 5 | 2011-04-30 | 2011-05-06 |
2011-05-11 | 2011-05-15 | 1 | 2011-05-11 | 2011-05-25 |
2011-05-16 | 2011-05-17 | 0 | 2011-05-11 | 2011-05-25 |
2011-05-17 | 2011-05-25 | 10 | 2011-05-11 | 2011-05-25 |
I have attempted to use df.shift
to get the PreviousStartDate, PreviousEndDate, NextStartDate, NextEndDate
, and then tried to loop through the dataset with df.iterrows()
, calculating the new dates, but I run into a technical issue that I don't fully understand - while in the loop that was used to update a value, I cannot use an operator on that new value - the new value hasn't been written to the df yet.
CodePudding user response:
Try with groupby
and tranform
:
#convert to datetime if necessary
df["StartDate"] = pd.to_datetime(df["StartDate"])
df["EndDate"] = pd.to_datetime(df["EndDate"])
df["NewStartDate"] = df.groupby((df["DaysBetweenSessions"]>=2).cumsum().shift().fillna(0))["StartDate"].transform("min")
df["NewEndDate"] = df.groupby((df["DaysBetweenSessions"]>=2).cumsum().shift().fillna(0))["EndDate"].transform("max")
>>> df
StartDate EndDate DaysBetweenSessions NewStartDate NewEndDate
0 2011-04-15 2011-04-21 3 2011-04-15 2011-04-21
1 2011-04-24 2011-04-27 3 2011-04-24 2011-04-27
2 2011-04-30 2011-05-01 0 2011-04-30 2011-05-06
3 2011-05-01 2011-05-06 5 2011-04-30 2011-05-06
4 2011-05-11 2011-05-15 1 2011-05-11 2011-05-25
5 2011-05-16 2011-05-17 0 2011-05-11 2011-05-25
6 2011-05-17 2011-05-25 10 2011-05-11 2011-05-25