Grouping by consecutive dates into date ranges using Python-CodePudding

I have the following data set.

ID     Date   
abc    2017-01-07  
abc    2017-01-08  
abc    2017-01-09  
abc    2017-12-09  
xyz    2017-01-05  
xyz    2017-01-06 
xyz    2017-04-15  
xyz    2017-04-16

I am able to generate the following output

ID     Count
abc    3
abc    1
xyz    2
xyz    2

using the following code mentioned in count consecutive days python dataframe

d = {
    'ID': ['abc', 'abc', 'abc', 'abc', 'xyz', 'xyz', 'xyz', 'xyz'],
    'Date': ['2017-01-07','2017-01-08', '2017-01-09', '2017-12-09', '2017-01-05', '2017-01-06', '2017-04-15', '2017-04-16']
}

df = pd.DataFrame(data=d)
df['Date'] = pd.to_datetime(df['Date'])

series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()
df.groupby(['ID', series]).size().reset_index(level=1, drop=True)

How can I get the following output?

ID     Start        End
abc    2017-01-07   2017-01-09
abc    2017-12-09   2017-12-09
xyz    2017-01-05   2017-01-06
xyz    2017-04-15   2017-04-16

CodePudding user response：

You can use:

series = df.groupby('ID').Date.diff().dt.days.ne(1).cumsum()

(df.groupby(['ID', series])
   .agg(Start=('Date', 'min'), End=('Date', 'min'))
   .droplevel(1)
   .reset_index()
)

output:

    ID      Start        End
0  abc 2017-01-07 2017-01-07
1  abc 2017-12-09 2017-12-09
2  xyz 2017-01-05 2017-01-05
3  xyz 2017-04-15 2017-04-15

CodePudding user response：

Use @mozway's answer

Use agg:

out = df.groupby(df.groupby('ID')['Date'].diff().ne(pd.Timedelta(days=1)).cumsum()) \
           ['Date'].agg(**{'Start': 'first', 'End': 'last'}).reset_index()
print(out)

# Output:
          Start        End
Date                      
1    2017-01-07 2017-01-09
2    2017-12-09 2017-12-09
3    2017-01-05 2017-01-06
4    2017-04-15 2017-04-16