Home > Software engineering >  Dataframe - find last time of the day, grouped by day
Dataframe - find last time of the day, grouped by day

Time:10-21

I have a simple dataframe with datetime and their date

df = pd.DataFrame( [['2021-01-01 10:10', '2021-01-01'],
                    ['2021-01-03 13:33', '2021-01-03'],
                    ['2021-01-03 14:44', '2021-01-03'],
                    ['2021-01-07 17:17', '2021-01-07'],
                    ['2021-01-07 07:07', '2021-01-07'],
                    ['2021-01-07 01:07', '2021-01-07'],
                    ['2021-01-09 09:09', '2021-01-09']],
                    columns=['datetime', 'date'])

I would like to create a new column containing the last datetime of each day. I have something quite close, but the last datetime of the day is only filled on the last datetime of the day... A weird NaT (Not a Time) is filled on all other cells. Can you suggest something better?

df['eod']=df.groupby('date')['datetime'].tail(1)

CodePudding user response:

You don't really need another date column if the date part is coming from the datetime column. You can group by dt.day of the datetime column, then call last for the datetime value:

>>> df['datetime'] = pd.to_datetime(df['datetime'])
>>> df.groupby(df['datetime'].dt.day)['datetime'].last()

datetime
1   2021-01-01 10:10:00
3   2021-01-03 14:44:00
7   2021-01-07 01:07:00
9   2021-01-09 09:09:00
Name: datetime, dtype: datetime64[ns]

CodePudding user response:

You are probably looking for transform which will return the result to every row in the group.

df['eod'] = df.groupby('date').transform('last')

Output

           datetime        date               eod
0  2021-01-01 10:10  2021-01-01  2021-01-01 10:10
1  2021-01-03 13:33  2021-01-03  2021-01-03 14:44
2  2021-01-03 14:44  2021-01-03  2021-01-03 14:44
3  2021-01-07 17:17  2021-01-07  2021-01-07 01:07
4  2021-01-07 07:07  2021-01-07  2021-01-07 01:07
5  2021-01-07 01:07  2021-01-07  2021-01-07 01:07
6  2021-01-09 09:09  2021-01-09  2021-01-09 09:09
  • Related