Home > OS >  Daily maximum in pandas with missing dates
Daily maximum in pandas with missing dates

Time:03-01

I am currently somewhat stuck on getting the daily maximum for my dataset. It looks like this:

                      Date  Value
0      1996-03-07 21:30:00  360.0
1      1996-03-07 21:45:00  360.0
2      1996-03-07 22:00:00  360.0
3      1996-03-07 22:15:00  360.0
4      1996-03-07 22:30:00  360.0
...                    ...    ...
867882 2021-02-03 12:45:00  361.9
867883 2021-02-03 13:00:00  361.8
867884 2021-02-03 13:15:00  361.8
867885 2021-02-03 13:30:00  361.7
867886 2021-02-03 13:45:00  361.8
[867887 rows x 2 columns]

The problem is that inside the dataset entire days are missing. If I understood correctly the grouper in pandas needs continues days to work properly. So I refilled the dates:

df.set_index('Date', inplace=True)
all_days = pd.date_range(df.index.min(), df.index.max(), freq='15T')
df = df.reindex(all_days)

But when I now run my code to get the daily maximum

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D')).idxmax().iloc[:, 0]]

I get the following error message:

The DTypes <class 'numpy.dtype[float64]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.

When I check with df.index, I get

DatetimeIndex(['1996-03-07 21:30:00', '1996-03-07 21:45:00',
           '1996-03-07 22:00:00', '1996-03-07 22:15:00',
           '1996-03-07 22:30:00', '1996-03-07 22:45:00',
           '1996-03-07 23:00:00', '1996-03-07 23:15:00',
           '1996-03-07 23:30:00', '1996-03-07 23:45:00',
           ...
           '2021-02-03 11:30:00', '2021-02-03 11:45:00',
           '2021-02-03 12:00:00', '2021-02-03 12:15:00',
           '2021-02-03 12:30:00', '2021-02-03 12:45:00',
           '2021-02-03 13:00:00', '2021-02-03 13:15:00',
           '2021-02-03 13:30:00', '2021-02-03 13:45:00'],
          dtype='datetime64[ns]', length=873474, freq='15T')

and checking the dtype of my Value column returns dtype('float64').

I am probably missing something very obvious here, but I'm honestly not familiar at all with dtypes and date formats.

CodePudding user response:

I think here is problem is not defined column after groupby, so is not returned Series, but DataFrame:

daily_maximum = df.loc[df.groupby(pd.Grouper(freq='D'))['value'].idxmax()]
  • Related