Home > OS >  Group rows sequentially in python
Group rows sequentially in python

Time:06-20

I have a data set as such and I need to group them sequentially, and join the dates.

number, date 

123456,2021-01-16
123456,2021-01-18
98765,2021-01-19 
98765,2021-02-01
123456,2021-02-02
123456,2021-02-03
123456,2021-02-09
123456,2021-02-11
7645323,2021-02-13
7645323,2021-02-16
7645323,2021-02-17

I want the output to be like,

number,date 

123456, 2021-01-16 to 2021-01-18
98765, 2021-01-19 to 2021-02-01
123456, 2021-02-02 to 2021-02-11
7645323, 2021-02-13 to 2021-02-17

This is my code, referred from this enter image description here

NOTE:

Link to test data and output data

CodePudding user response:

You can do with shift and cumsum create the groupby key

key = df.number.ne(df.number.shift()).cumsum()
out = df.groupby(key).agg({'number':'first',
                           'date' : lambda x : x.iloc[0]   ' to '    x.iloc[-1]})
Out[822]: 
         number                       date
number                                    
1        123456   2021-01-16 to 2021-01-18
2         98765  2021-01-19  to 2021-02-01
3        123456   2021-02-02 to 2021-02-11
4       7645323   2021-02-13 to 2021-02-17
  • Related