Home > Net >  Mean of grouped data using Pandas?
Mean of grouped data using Pandas?

Time:07-23

I am attempting to use Pandas to calculate hourly averages from one minute data for a fairly large dataset. My dataset is stored as a .csv, and looks like:

,timestamp_local,pm25
0,5/30/2022 16:59,2.2802
1,5/30/2022 16:58,2.7055
2,5/30/2022 16:57,1.6602
3,5/30/2022 16:56,1.5924
4,5/30/2022 16:55,2.2792
5,5/30/2022 16:54,2.2219

I attempted to read the .csv file as a Pandas DataFrame and then store the hourly means in a new DataFrame that I will export:

import pandas as pd

df = pd.read_csv('extracted_data.csv')
df['timestamp_local'] = pd.to_datetime(df['timestamp_local'])

df1 = df.groupby([pd.Grouper(freq='H', key='timestamp_local'), 'pm25']).mean()

print(df1)

After executing this script, I am left with the minute data grouped by it's respective hour:

timestamp_local     pm25       unnamed: 0      
2022-04-30 17:00:00 0.8766     42917.0
                    0.9124     42913.0
                    0.9647     42893.0
                    0.9842     42883.0
                    1.0974     42894.0
...                                ...
2022-05-30 16:00:00 3.0069        43.0
                    3.1727        40.0
                    3.2482        30.0
                    3.3628        42.0
                    4.5797        13.0

My goal is to eliminate the minute data altogether so I would just have the timestamp and the calculated hourly average, but I am unsure how to go about this. Any advice would be greatly appreciated.

CodePudding user response:

have you tried this? With a pandas pipeline you should be able to stack grouping and resampling methods.

df1 = df.groupby('pm25']).resample('H').mean()

CodePudding user response:

I was able to figure this out:

import pandas as pd

df = pd.read_csv('extracted_data.csv')
df.index = pd.to_datetime(df['timestamp_local'])

df1 = df.groupby([df.index.day, df.index.hour]).mean()

df1.to_excel('output.xlsx')

This gave me the data grouped by day & hour with the calculated hourly mean of the 'pm25' parameter:

timestamp_local timestamp_local      pm25
1               0                    4.917292
                1                    2.976207
                2                    15.244288
                3                    5.802312
                4                    6.185230
...                                     ...        ...
30              19                   2.388808
                20                   1.300867
                21                   2.053445
                22                   7.007275
                23                   10.163283
  • Related