I have the following df:
df
Day Name Hour
1 M John 10:00:00
2 M John 11:00:00
3 M John 14:00:00
4 T Maria 20:00:00
5 T Maria 22:30:00
Hour is a datetime column. My goal is to get a new column with the difference between earliest and latest hour of a group (John it's 4 hours, Maria only has 2, so 2h30). Here is the desired output:
df
Day Name Hour Delay
1 M John 10:00:00 4
2 M John 11:00:00 4
3 M John 14:00:00 4
4 T Maria 20:00:00 2.5
5 T Maria 22:30:00 2.5
And here is what I tried but i'm stuck:
df['Delay']=df.groupby(['Day','Name']).transform((max(df['Hour'])-min(df['Hour']))
Thank you for your help!
CodePudding user response:
You can try
df['Delay'] = (df.groupby(['Day','Name'])['Hour']
.transform(lambda col: (pd.to_datetime(str(col.max())) - pd.to_datetime(str(col.min()))).seconds/3600))
print(df)
Day Name Hour Delay
1 M John 10:00:00 4.0
2 M John 11:00:00 4.0
3 M John 14:00:00 4.0
4 T Maria 20:00:00 2.5
5 T Maria 22:30:00 2.5