Home > Software engineering >  Pandas resample drops (static) datetime column, how do I keep it?
Pandas resample drops (static) datetime column, how do I keep it?

Time:12-05

I'm working with a pandas Multiindex that is given by the three keys:
[Verbundzuordnung, ProjektIndex, Datum],

I would like to resample the dataframe on Datum hourly, which drops the right colum TagDesAbdichtens, I would like to keep it as it's static.

            
Verbundzuordnung    ProjektIndex    Datum                           TagDesAbdichtens
1                   81679           2021-11-10 00:00:00 00:00       2021-12-08
                                    2021-11-10 00:00:00 00:00       2021-12-08
                                    2021-11-10 00:00:00 00:00       2021-12-08
                                    2021-11-10 00:00:00 00:00       2021-12-08
                                    2021-11-10 00:00:00 00:00       2021-12-08
...     ...     ...     ...
2                   94574           2022-02-28 23:00:00 00:00       2022-01-31
                                    2022-02-28 23:00:00 00:00       2022-01-31
                                    2022-02-28 23:00:00 00:00       2022-01-31
                                    2022-02-28 23:00:00 00:00       2022-01-31
                                    2022-02-28 23:00:00 00:00       2022-01-31

285192 rows × 1 columns

There are aditional columns that I left out here for easier comprehension.

I am currently applying this to resample the dataframe

all_merged = all_merged.groupby([
    pd.Grouper(level='Verbundzuordnung'), 
    pd.Grouper(level='ProjektIndex'), 
    pd.Grouper(level='Datum', freq='H')]
  )

all_merged.mean() gives me the wanted output with TagDesAbdichtens missing. This value ist for each Verbundzuordnung and ProjektIndex unique and static and I would like to have it back in the resampled version.

Is there a way to do it with native pandas functions?

CodePudding user response:

I've had success resampling using the native resample function. For example,

    resample_dict = {                                                                                                             
            'Verbundzuordnung': 'mean',                                                                                                    
            'ProjektIndex': 'mean',
            'TagDesAbdichtens': 'first'
    }

    data = data.resample("60T", closed='left', label='left').apply(resample_dict)

You can apply whichever grouping keys (in place of mean) to your columns (e.g. first, min, max, etc).

See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html for more.

CodePudding user response:

Instead of mean() you can do the following

agg({'TagDesAbdichtens': 'first', 'another_col': 'mean', 'another_col2': 'mean', ... })

That is, you can specify a different aggregate function for each column.

  • Related