I am currently somewhat stuck on getting the daily maximum for my dataset. It looks like this:
Date Value
0 1996-03-07 21:30:00 360.0
1 1996-03-07 21:45:00 360.0
2 1996-03-07 22:00:00 360.0
3 1996-03-07 22:15:00 360.0
4 1996-03-07 22:30:00 360.0
... ... ...
867882 2021-02-03 12:45:00 361.9
867883 2021-02-03 13:00:00 361.8
867884 2021-02-03 13:15:00 361.8
867885 2021-02-03 13:30:00 361.7
867886 2021-02-03 13:45:00 361.8
[867887 rows x 2 columns]
The problem is that inside the dataset entire days are missing. If I understood correctly the grouper in pandas needs continues days to work properly. So I refilled the dates:
df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)
But when I now run my code to get the daily maximum
daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]
I get the following error message:
The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
When I check with df.index
, I get
DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
'1996-03-07 22:00:00', '1996-03-07 22:15:00',
'1996-03-07 22:30:00', '1996-03-07 22:45:00',
'1996-03-07 23:00:00', '1996-03-07 23:15:00',
'1996-03-07 23:30:00', '1996-03-07 23:45:00',
...
'2021-02-03 11:30:00', '2021-02-03 11:45:00',
'2021-02-03 12:00:00', '2021-02-03 12:15:00',
'2021-02-03 12:30:00', '2021-02-03 12:45:00',
'2021-02-03 13:00:00', '2021-02-03 13:15:00',
'2021-02-03 13:30:00', '2021-02-03 13:45:00'],
dtype='datetime64[ns]', length=873474, freq='15T')
and checking the dtype of my Value
column returns dtype('float64')
.
I am probably missing something very obvious here, but I'm honestly not familiar at all with dtypes and date formats.
CodePudding user response:
I think here is problem is not defined column after groupby
, so is not returned Series
, but DataFrame:
daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D'))['value'].idxmax()]