I'm working with a pandas Multiindex that is given by the three keys:
[Verbundzuordnung, ProjektIndex, Datum],
I would like to resample the dataframe on Datum hourly, which drops the right colum TagDesAbdichtens
, I would like to keep it as it's static.
Verbundzuordnung ProjektIndex Datum TagDesAbdichtens
1 81679 2021-11-10 00:00:00 00:00 2021-12-08
2021-11-10 00:00:00 00:00 2021-12-08
2021-11-10 00:00:00 00:00 2021-12-08
2021-11-10 00:00:00 00:00 2021-12-08
2021-11-10 00:00:00 00:00 2021-12-08
... ... ... ...
2 94574 2022-02-28 23:00:00 00:00 2022-01-31
2022-02-28 23:00:00 00:00 2022-01-31
2022-02-28 23:00:00 00:00 2022-01-31
2022-02-28 23:00:00 00:00 2022-01-31
2022-02-28 23:00:00 00:00 2022-01-31
285192 rows × 1 columns
There are aditional columns that I left out here for easier comprehension.
I am currently applying this to resample the dataframe
all_merged = all_merged.groupby([
pd.Grouper(level='Verbundzuordnung'),
pd.Grouper(level='ProjektIndex'),
pd.Grouper(level='Datum', freq='H')]
)
all_merged.mean() gives me the wanted output with TagDesAbdichtens
missing.
This value ist for each Verbundzuordnung and ProjektIndex unique and static and I would like to have it back in the resampled version.
Is there a way to do it with native pandas functions?
CodePudding user response:
I've had success resampling using the native resample
function. For example,
resample_dict = {
'Verbundzuordnung': 'mean',
'ProjektIndex': 'mean',
'TagDesAbdichtens': 'first'
}
data = data.resample("60T", closed='left', label='left').apply(resample_dict)
You can apply whichever grouping keys (in place of mean
) to your columns (e.g. first
, min
, max
, etc).
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html for more.
CodePudding user response:
Instead of mean()
you can do the following
agg({'TagDesAbdichtens': 'first', 'another_col': 'mean', 'another_col2': 'mean', ... })
That is, you can specify a different aggregate function for each column.