I have a dataset like below, Its time col is based on milliseconds.
pid_col ,timestamp_col ,value_col
31,2019-03-29 07:14:56.999999756,0.0
31,2019-03-29 07:14:57.250000,0.614595
31,2019-03-29 07:14:57.500000,0.678615
31,2019-03-29 07:14:57.750000,0.687578
31,2019-03-29 07:14:58.000000244,0.559804
31,2019-03-29 07:14:58.250000,0.522672
31,2019-03-29 07:14:58.499999512,0.51627
31,2019-03-29 07:14:58.750000,0.51627
31,2019-03-29 07:14:59.000000244,0.517551
31,2019-03-29 07:14:59.250000,0.51627
31,2019-03-29 07:14:59.500000244,0.509868
31,2019-03-29 07:14:59.750000488,0.513709
31,2019-03-29 07:15:00,0.513709
31,2019-03-29 07:15:00.249999512,0.518831
31,2019-03-29 07:15:00.500000,0.531635
How could I compute the average of values in every 5 seconds?? I've used resample but it didn't work. here is my code:
col_list = ["timestamp", "pid","value"]
df = read_csv("data.csv", usecols=col_list)
df['timestamp'] = to_datetime(df['timestamp'], unit='ms')
timestamp_col=df['timestamp'].tolist()
pid_col=df['pid'].tolist()
value_col=df['value'].tolist()
df['timestamp'].resample('5S').mean()
timestamp_col=df['timestamp'].tolist()
Thanks for your helps
CodePudding user response:
Define timestamp_col
as index before resample:
>>> df.set_index('timestamp').groupby('pid') \
.resample('5S')['value'].mean().reset_index()
pid timestamp value
0 31 2019-03-29 07:14:55 0.512767
1 31 2019-03-29 07:15:00 0.521392
Update
Maybe you prefer this version:
>>> df.groupby(['pid', pd.Grouper(freq='5S', key='timestamp')], as_index=False) \
.agg({'pid': 'first', 'timestamp': 'first', 'value': 'mean'})
pid timestamp value
0 31 2019-03-29 07:14:56.999999756 0.512767
1 31 2019-03-29 07:15:00.000000000 0.521392