Goal

Group a pandas dataframe in 30s intervals and extract the data to plot it.

Example

import pandas as pd

log = [
        ['2022/10/10_6:13:39', '6328f0c6ad70889fd28dcd07'],
        ['2022/10/10_6:13:49', '6328f0c6ad70889fd28dcd07'],
        ['2022/10/10_6:14:23', '6328f0c6ad70889fd28dcd07'],
        ['2022/10/10_6:14:25', '6328b959a5745f6fa5206fa6'],
        ['2022/10/10_6:15:4', '6328b959a5745f6fa5206fa6'],
        ['2022/10/10_6:15:52', '628fa4ac88be7ffeb9b7e7e3']]

df = pd.DataFrame(log,
                 columns=['timestamp', 'data'])

# convert to timestamp format
df['timestamp'] = pd.to_datetime(df['timestamp'],format='%Y/%m/%d_%H:%M:%S')

The dataframe:

            timestamp                      data
0 2022-10-10 06:13:39  6328f0c6ad70889fd28dcd07
1 2022-10-10 06:13:49  6328f0c6ad70889fd28dcd07
2 2022-10-10 06:14:23  6328f0c6ad70889fd28dcd07
3 2022-10-10 06:14:25  6328b959a5745f6fa5206fa6
4 2022-10-10 06:15:04  6328b959a5745f6fa5206fa6
5 2022-10-10 06:15:52  628fa4ac88be7ffeb9b7e7e3

My approach

# Group in intervals 
g = df.groupby(pd.Grouper(key='timestamp',freq='30s'))

The issue

I would like to see the grouped dataframe. How do I do that?
I would like to plot how many unique data there was within each interval.

CodePudding user response：

when you use group by you need an aggregation function in your case if you want the number of values you can use count(). To check the data grouped you can use list. Then you can just plot the data using a bar plot

grouped_data = df.groupby(pd.Grouper(key='timestamp',freq='30s')).agg(list)

grouped_counts = df.groupby(pd.Grouper(key='timestamp',freq='30s')).count()
grouped_counts.plot(kind='bar')

EDIT for unique values

if you want unique values, you can aggregate by a set and count the values

grouped_data = df.groupby(pd.Grouper(key='timestamp',freq='30s')).agg(set)
grouped_data['counts'] = grouped_data['data'].apply(lambda x: len(x))
grouped_data.plot(y='counts', kind='bar')