Home > Blockchain >  Access a pandas group as new data frame
Access a pandas group as new data frame

Time:06-21

I am new to data analysis with pandas/pandas, coming from a Matlab background. I am trying to group data and then process the individual groups. However, I cannot figure out how to actually access the grouping result.

Here is my setup: I have a pandas dataframe df with a regular-spaced DateTime index timestamp of 10 minutes frequency. My data spans several weeks in total. I now want to group the data by days, like so:

grouping = df.groupby([pd.Grouper(level="timestamp", freq="D",)])

Note that I do not want to aggregate the groups (contrary to most examples and tutorials, it seems). I simply want to take each group in turn and process it individually, like so (does not work):

for g in grouping:
  g_df = d.toDataFrame()
  some_processing(g_df)

How do I do that? I haven't found any way to extract daily dataframe objects from the DataFrameGroupBy object.

CodePudding user response:

Expand your groups into a dictionary of dataframes:

data = dict(list(df.groupby(df.index.date.astype(str))))
>>> data.keys()
dict_keys(['2021-01-01', '2021-01-02'])

>>> data['2021-01-01']
                        value
timestamp                    
2021-01-01 00:00:00  0.405630
2021-01-01 01:00:00  0.262235
2021-01-01 02:00:00  0.913946
2021-01-01 03:00:00  0.467516
2021-01-01 04:00:00  0.367712
2021-01-01 05:00:00  0.849070
2021-01-01 06:00:00  0.572143
2021-01-01 07:00:00  0.423401
2021-01-01 08:00:00  0.931463
2021-01-01 09:00:00  0.554809
2021-01-01 10:00:00  0.561663
2021-01-01 11:00:00  0.537471
2021-01-01 12:00:00  0.461099
2021-01-01 13:00:00  0.751878
2021-01-01 14:00:00  0.266371
2021-01-01 15:00:00  0.954553
2021-01-01 16:00:00  0.895575
2021-01-01 17:00:00  0.752671
2021-01-01 18:00:00  0.230219
2021-01-01 19:00:00  0.750243
2021-01-01 20:00:00  0.812728
2021-01-01 21:00:00  0.195416
2021-01-01 22:00:00  0.178367
2021-01-01 23:00:00  0.607105

Note: I changed your groups to be easier indexing: '2021-01-01' instead of Timestamp('2021-01-01 00:00:00', freq='D')

  • Related