I'm looking to groupby the following dataframe using agg function of count and sum but also wanted to get the average time difference in minutes between the DateTimeIndex of the dataframe. When i use df.index i get the error TypeError: unhashable type: 'DatetimeIndex'
data = df.groupby('letter').agg({'letter': 'count', 'occurences' : 'sum', df.index: lambda x: x.diff().mean().dt.minutes})})
If I reset the index with 'timestamp' as a column.
data = df.reset_index().groupby('letter').agg({'letter': 'count', 'occurences' : 'sum', 'timestamp': lambda x: x.diff().mean().dt.minutes})})
I get the error AttributeError: 'NaTType' object has no attribute 'dt'. I believe that is because letter 'D' only has one instance. I was hoping to treat that as 0 mins.
timestamp letter occurences
2021-11-29 12:00:00 00:00 A 100
2021-11-29 12:30:00 00:00 B 500
2021-11-29 13:00:00 00:00 B 300
2021-11-30 13:30:00 00:00 A 1000
2021-11-30 14:00:00 00:00 C 250
2021-11-29 14:30:00 00:00 A 100
2021-11-29 15:00:00 00:00 D 500
2021-11-29 15:30:00 00:00 A 300
2021-11-30 16:00:00 00:00 A 1000
2021-11-30 16:30:00 00:00 C 250
CodePudding user response:
Please try np.timedelta
data = df.reset_index().groupby('letter').agg({'letter': 'count', 'occurences' : 'sum', 'timestamp': lambda x: (x.diff().mean())/np.timedelta64(1, 'm')})
CodePudding user response:
You can use:
data = df.groupby('letter').agg(
{'letter': 'count',
'occurences' : 'sum',
'timestamp': lambda x: x.diff().mean().total_seconds()/60.0
})