I am new to data analysis with pandas/pandas, coming from a Matlab background. I am trying to group data and then process the individual groups. However, I cannot figure out how to actually access the grouping result.
Here is my setup: I have a pandas dataframe df
with a regular-spaced DateTime index timestamp
of 10 minutes frequency. My data spans several weeks in total. I now want to group the data by days, like so:
grouping = df.groupby([pd.Grouper(level="timestamp", freq="D",)])
Note that I do not want to aggregate the groups (contrary to most examples and tutorials, it seems). I simply want to take each group in turn and process it individually, like so (does not work):
for g in grouping:
g_df = d.toDataFrame()
some_processing(g_df)
How do I do that? I haven't found any way to extract daily dataframe objects from the DataFrameGroupBy
object.
CodePudding user response:
Expand your groups into a dictionary of dataframes:
data = dict(list(df.groupby(df.index.date.astype(str))))
>>> data.keys()
dict_keys(['2021-01-01', '2021-01-02'])
>>> data['2021-01-01']
value
timestamp
2021-01-01 00:00:00 0.405630
2021-01-01 01:00:00 0.262235
2021-01-01 02:00:00 0.913946
2021-01-01 03:00:00 0.467516
2021-01-01 04:00:00 0.367712
2021-01-01 05:00:00 0.849070
2021-01-01 06:00:00 0.572143
2021-01-01 07:00:00 0.423401
2021-01-01 08:00:00 0.931463
2021-01-01 09:00:00 0.554809
2021-01-01 10:00:00 0.561663
2021-01-01 11:00:00 0.537471
2021-01-01 12:00:00 0.461099
2021-01-01 13:00:00 0.751878
2021-01-01 14:00:00 0.266371
2021-01-01 15:00:00 0.954553
2021-01-01 16:00:00 0.895575
2021-01-01 17:00:00 0.752671
2021-01-01 18:00:00 0.230219
2021-01-01 19:00:00 0.750243
2021-01-01 20:00:00 0.812728
2021-01-01 21:00:00 0.195416
2021-01-01 22:00:00 0.178367
2021-01-01 23:00:00 0.607105
Note: I changed your groups to be easier indexing: '2021-01-01'
instead of Timestamp('2021-01-01 00:00:00', freq='D')