I have a data set as such and I need to group them sequentially, and join the dates.
number, date
123456,2021-01-16
123456,2021-01-18
98765,2021-01-19
98765,2021-02-01
123456,2021-02-02
123456,2021-02-03
123456,2021-02-09
123456,2021-02-11
7645323,2021-02-13
7645323,2021-02-16
7645323,2021-02-17
I want the output to be like,
number,date
123456, 2021-01-16 to 2021-01-18
98765, 2021-01-19 to 2021-02-01
123456, 2021-02-02 to 2021-02-11
7645323, 2021-02-13 to 2021-02-17
This is my code, referred from this
NOTE:
Link to test data and output data
CodePudding user response:
You can do with shift
and cumsum
create the groupby
key
key = df.number.ne(df.number.shift()).cumsum()
out = df.groupby(key).agg({'number':'first',
'date' : lambda x : x.iloc[0] ' to ' x.iloc[-1]})
Out[822]:
number date
number
1 123456 2021-01-16 to 2021-01-18
2 98765 2021-01-19 to 2021-02-01
3 123456 2021-02-02 to 2021-02-11
4 7645323 2021-02-13 to 2021-02-17