I want to perform a sampling from a datetime series pandas using resample method. I don't understand the output I've got. I was expecting to get a sampling of '5s' but I'm getting 17460145 rows from 100 original dataframe. How should be the correct use of resample ?
import numpy as np
import pandas as pd
def random_dates(start, end, n=100):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
start = pd.to_datetime('2022-01-01')
end = pd.to_datetime('2023-01-01')
rd=random_dates(start, end)
clas = np.random.choice(['A','B','C'],size=100)
value = np.random.randint(0,100,size=100)
df =pd.DataFrame.from_dict({'ts':rd,'cl':clas,'vl':value}).set_index('ts').sort_index()
df
Out[48]:
cl vl
ts
2022-01-04 17:25:10 B 27
2022-01-06 19:17:35 C 34
2022-01-17 22:55:25 B 1
2022-01-23 00:33:25 A 20
2022-01-27 18:26:56 A 55
.. ..
2022-12-14 07:46:50 C 22
2022-12-18 02:33:52 C 52
2022-12-22 17:35:10 A 52
2022-12-28 04:55:20 A 57
2022-12-29 03:19:00 A 60
[100 rows x 2 columns]
df.groupby(by='cl').resample('5s').mean()
Out[49]:
vl
cl ts
A 2022-01-23 00:33:25 20.0
2022-01-23 00:33:30 NaN
2022-01-23 00:33:35 NaN
2022-01-23 00:33:40 NaN
2022-01-23 00:33:45 NaN
...
C 2022-12-18 02:33:30 NaN
2022-12-18 02:33:35 NaN
2022-12-18 02:33:40 NaN
2022-12-18 02:33:45 NaN
2022-12-18 02:33:50 52.0
[17460145 rows x 1 columns]
CodePudding user response:
Use pd.Grouper
:
>>> df.groupby(['cl', pd.Grouper(freq='5s')]).mean()
vl
cl ts
A 2022-01-22 11:53:30 31.0
2022-02-01 21:24:55 60.0
2022-03-20 06:01:05 24.0
2022-04-03 00:04:05 55.0
2022-04-03 06:30:10 81.0
... ...
C 2022-11-23 23:17:20 92.0
2022-11-25 07:07:45 27.0
2022-12-07 00:18:05 88.0
2022-12-25 10:37:25 77.0
2022-12-28 14:29:25 33.0
[100 rows x 1 columns]