I have a dataframe laid out like the following with site names and a range of dates for each site.
Site Date
Site_1 02/09/2011
Site_1 03/09/2011
Site_1 04/09/2011
Site_1 05/09/2011
Site_2 14/01/2010
Site_2 15/01/2010
Site_2 16/01/2010
Site_2 17/01/2010
I'd like to find the start and end dates for each site and lay the dataframe out like this:
Site Start_Date End_Date
Site_1 02/09/2011 04/09/2011
Site_2 14/01/2010 17/01/2010
I know I can find the min and max of the full range like this:
df['Date'].max()
df['Date'].min()
Just not sure what the best method is to apply this separately to each site.
CodePudding user response:
I would advise using a groupby
on the "site" column and aggregating each group into a min
and max
date.
df.groupby("Site").agg({'date': ['min', 'max']})
This will return the min
and max
date for each site.
I haven't tried out the code, but it should do what you want.
CodePudding user response:
Using the advanced syntax of groupby
agg
you can directly get your custom format:
df.groupby('Site')['Date'].agg([('Start_Date', 'min'), ('End_Date', 'max')])
output:
Start_Date End_Date
Site
Site_1 02/09/2011 05/09/2011
Site_2 14/01/2010 17/01/2010