I have a dataset like below, Its time col is based on milliseconds.
pid_col ,timestamp_col ,value_col
31,2019-03-29 07:14:56.999999756,0.0
31,2019-03-29 07:14:57.250000,0.614595
31,2019-03-29 07:14:57.500000,0.678615
31,2019-03-29 07:14:57.750000,0.687578
31,2019-03-29 07:14:58.000000244,0.559804
31,2019-03-29 07:14:58.250000,0.522672
31,2019-03-29 07:14:58.499999512,0.51627
31,2019-03-29 07:14:58.750000,0.51627
31,2019-03-29 07:14:59.000000244,0.517551
31,2019-03-29 07:14:59.250000,0.51627
31,2019-03-29 07:14:59.500000244,0.509868
31,2019-03-29 07:14:59.750000488,0.513709
31,2019-03-29 07:15:00,0.513709
31,2019-03-29 07:15:00.249999512,0.518831
31,2019-03-29 07:15:00.500000,0.531635
How could I compute the average of values in every 5 seconds??for this dataset I should exactly compute the avg of values every five seconds... I mean values for the first 5 seconds should compute between 7:14:56 till 7:15:01 and so on for every 5 seconds.here is my code:
col_list = ["timestamp", "pid","value"]
df = read_csv("data.csv", usecols=col_list)
df['timestamp'] = to_datetime(df['timestamp'], unit='ms')
df = df.groupby(['pid', Grouper(freq='5S', key='timestamp')], as_index=False) \
.agg({'timestamp': 'first', 'value': 'mean'})
Thanks for your helps
CodePudding user response:
There's a nice library called datetime
which is able to make operations between dates. For exemple:
from datetime import datetime, timedelta
# datetime(year, month, day, hour, minute, second, microsecond)
time0 = datetime(2019, 3, 29, 7, 14, 57, 500000)
print(time0)
fiveseconds = timedelta(seconds=5)
print(fiveseconds)
time1 = time0 fiveseconds
print(time1)
gives the output
2019-03-29 07:14:57.500000
0:00:05
2019-03-29 07:15:02.500000
And you can make comparations between then:
from datetime import datetime, timedelta
time0 = datetime(2019, 3, 29, 7, 14, 57, 500000)
fourseconds = timedelta(seconds=4)
fiveseconds = timedelta(seconds=5)
sixseconds = timedelta(seconds=6)
time1 = time0 fiveseconds
print(time1 < (time0 fourseconds)) # False
print(time1 < (time0 sixseconds)) # True
So, for your problem:
from datetime import datetime, timedelta
from numpy import floor
def convert(timestr):
"""
It receives a string, like ""2019-03-29 07:14:57.250000"
And returns a datetime instance
"""
date = timestr.split(" ")
year, month, day = date[0].split("-")
year = int(year)
month = int(month)
day = int(day)
hour, minute, second = date[1].split(":")
hour = int(hour)
minute = int(minute)
intsecond = int(second.split(".")[0])
if "." in second:
microsecond = int(floor(1e 6 * float("0." second.split(".")[1])))
else:
microsecond = 0
return datetime(year, month, day, hour, minute, intsecond, microsecond)
listtimes = ["2019-03-29 07:14:56.999999756",
"2019-03-29 07:14:57.250000",
"2019-03-29 07:14:57.500000",
"2019-03-29 07:14:57.750000",
"2019-03-29 07:14:58.000000244",
"2019-03-29 07:14:58.250000",
"2019-03-29 07:14:58.499999512",
"2019-03-29 07:14:58.750000",
"2019-03-29 07:14:59.000000244",
"2019-03-29 07:14:59.250000",
"2019-03-29 07:14:59.500000244",
"2019-03-29 07:14:59.750000488",
"2019-03-29 07:15:00",
"2019-03-29 07:15:00.249999512",
"2019-03-29 07:15:00.500000"]
listvalues = [0.0,
0.614595,
0.678615,
0.687578,
0.559804,
0.522672,
0.51627,
0.51627,
0.517551,
0.51627,
0.509868,
0.513709,
0.513709,
0.518831,
0.531635]
dt = timedelta(seconds=5)
averagevalues = []
time0 = convert(listtimes[0])
time1 = time0 dt
counter = 0
mysum = 0
for i, v in enumerate(listvalues):
if convert(listtimes[i]) >= time1:
averagevalues.append(mysum / counter)
counter = 0
mysum = 0
time1 = dt
counter = 1
mysum = v
if counter != 0:
averagevalues.append(mysum / counter)
print(averagevalues)
gives the result
[0.5144918]
Therefore, if you have a bigger list of values, and bigger times, the list averagevalues
will group the average for each 5 seconds
. In this exemple all the times were between 2019-03-29 07:14:56
and "2019-03-29 07:15:01
, so we got only one value inside averagevalues