I have a simple dataframe with datetime and their date
df = pd.DataFrame( [['2021-01-01 10:10', '2021-01-01'],
['2021-01-03 13:33', '2021-01-03'],
['2021-01-03 14:44', '2021-01-03'],
['2021-01-07 17:17', '2021-01-07'],
['2021-01-07 07:07', '2021-01-07'],
['2021-01-07 01:07', '2021-01-07'],
['2021-01-09 09:09', '2021-01-09']],
columns=['datetime', 'date'])
I would like to create a new column containing the last datetime of each day. I have something quite close, but the last datetime of the day is only filled on the last datetime of the day... A weird NaT (Not a Time) is filled on all other cells. Can you suggest something better?
df['eod']=df.groupby('date')['datetime'].tail(1)
CodePudding user response:
You don't really need another date
column if the date part is coming from the datetime column. You can group by dt.day
of the datetime column, then call last for the datetime
value:
>>> df['datetime'] = pd.to_datetime(df['datetime'])
>>> df.groupby(df['datetime'].dt.day)['datetime'].last()
datetime
1 2021-01-01 10:10:00
3 2021-01-03 14:44:00
7 2021-01-07 01:07:00
9 2021-01-09 09:09:00
Name: datetime, dtype: datetime64[ns]
CodePudding user response:
You are probably looking for transform
which will return the result to every row in the group.
df['eod'] = df.groupby('date').transform('last')
Output
datetime date eod
0 2021-01-01 10:10 2021-01-01 2021-01-01 10:10
1 2021-01-03 13:33 2021-01-03 2021-01-03 14:44
2 2021-01-03 14:44 2021-01-03 2021-01-03 14:44
3 2021-01-07 17:17 2021-01-07 2021-01-07 01:07
4 2021-01-07 07:07 2021-01-07 2021-01-07 01:07
5 2021-01-07 01:07 2021-01-07 2021-01-07 01:07
6 2021-01-09 09:09 2021-01-09 2021-01-09 09:09