This question was asked before, but I want to extend on it. Because I do not have enough experience points I could not comment on the question so I am reposting the link below followed by my comments:
Grouping by date range with pandas
I believe asker of this question wants to group items together within a specified timedelta of each other (3 days is specified in the question). However the answers, including the one marked correct, relate to grouping items in frequencies of 3 days using Grouper
. This eventually suits the asker because he only wants to group at most two items together, but what happens if this extends to three, four, five or more items?
Continuing the askers example code (which very closely relates to my own problem):
user_id date val
1 1-1-17 1
2 1-1-17 1
3 1-1-17 1
1 1-1-17 1
1 1-2-17 1
2 1-2-17 1
2 1-10-17 1
3 2-1-17 1
3 2-2-17 1
3 2-3-17 2
3 2-4-17 3
3 2-5-17 1
If the grouping would group by user_id and dates /- 3 days from each other the group by summing val should look like:
user_id date sum(val)
1 1-2-17 3
2 1-2-17 2
2 1-10-17 1
3 1-1-17 1
3 2-1-17 8
I'm not sure the last date will actually show as 2-1-17, but the idea is to group all dates within a 3-day timedelta of each other together.
Is this possible in an elegant way using Grouper
, resample
or other Pandas or Python date functions?
CodePudding user response:
You can use a groupby
with a custom group:
# convert to datetime
s = pd.to_datetime(df['date'], dayfirst=False)
# set up groups of consecutive dates within ± 3 days
group = (s.groupby(df['user_id'])
.apply(lambda s: s.diff().abs().gt('3days').cumsum())
)
# group by ID and new group and aggregate
out = (df.groupby(['user_id', group], as_index=False)
.agg({'date': 'last', 'val': 'sum'})
)
output:
user_id date val
0 1 1-2-17 3
1 2 1-2-17 2
2 2 1-10-17 1
3 3 1-1-17 1
4 3 2-5-17 8
intermediates (sorted by user_id
for clarity):
user_id date val datetime diff abs >3days cumsum
0 1 1-1-17 1 2017-01-01 NaT NaT False 0
3 1 1-1-17 1 2017-01-01 0 days 0 days False 0
4 1 1-2-17 1 2017-01-02 1 days 1 days False 0
1 2 1-1-17 1 2017-01-01 NaT NaT False 0
5 2 1-2-17 1 2017-01-02 1 days 1 days False 0
6 2 1-10-17 1 2017-01-10 8 days 8 days True 1
2 3 1-1-17 1 2017-01-01 NaT NaT False 0
7 3 2-1-17 1 2017-02-01 31 days 31 days True 1
8 3 2-2-17 1 2017-02-02 1 days 1 days False 1
9 3 2-3-17 2 2017-02-03 1 days 1 days False 1
10 3 2-4-17 3 2017-02-04 1 days 1 days False 1
11 3 2-5-17 1 2017-02-05 1 days 1 days False 1