how to use pandas resamle method?-CodePudding

I want to perform a sampling from a datetime series pandas using resample method. I don't understand the output I've got. I was expecting to get a sampling of '5s' but I'm getting 17460145 rows from 100 original dataframe. How should be the correct use of resample ?

import numpy as np
import pandas as pd

def random_dates(start, end, n=100):

    start_u = start.value//10**9
    end_u = end.value//10**9
    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')

start = pd.to_datetime('2022-01-01')
end = pd.to_datetime('2023-01-01')
rd=random_dates(start, end)
clas = np.random.choice(['A','B','C'],size=100)
value = np.random.randint(0,100,size=100)
df =pd.DataFrame.from_dict({'ts':rd,'cl':clas,'vl':value}).set_index('ts').sort_index()

df
Out[48]: 
                    cl  vl
ts                        
2022-01-04 17:25:10  B  27
2022-01-06 19:17:35  C  34
2022-01-17 22:55:25  B   1
2022-01-23 00:33:25  A  20
2022-01-27 18:26:56  A  55
                ..  ..
2022-12-14 07:46:50  C  22
2022-12-18 02:33:52  C  52
2022-12-22 17:35:10  A  52
2022-12-28 04:55:20  A  57
2022-12-29 03:19:00  A  60

[100 rows x 2 columns]

df.groupby(by='cl').resample('5s').mean()
Out[49]: 
                          vl
cl ts                       
A  2022-01-23 00:33:25  20.0
   2022-01-23 00:33:30   NaN
   2022-01-23 00:33:35   NaN
   2022-01-23 00:33:40   NaN
   2022-01-23 00:33:45   NaN
                     ...
C  2022-12-18 02:33:30   NaN
   2022-12-18 02:33:35   NaN
   2022-12-18 02:33:40   NaN
   2022-12-18 02:33:45   NaN
   2022-12-18 02:33:50  52.0

[17460145 rows x 1 columns]

CodePudding user response：

Use pd.Grouper:

>>> df.groupby(['cl', pd.Grouper(freq='5s')]).mean()
                          vl
cl ts                       
A  2022-01-22 11:53:30  31.0
   2022-02-01 21:24:55  60.0
   2022-03-20 06:01:05  24.0
   2022-04-03 00:04:05  55.0
   2022-04-03 06:30:10  81.0
...                      ...
C  2022-11-23 23:17:20  92.0
   2022-11-25 07:07:45  27.0
   2022-12-07 00:18:05  88.0
   2022-12-25 10:37:25  77.0
   2022-12-28 14:29:25  33.0

[100 rows x 1 columns]