I've got a data frame with the following:
Id final_date_time Speed
2022484408 2016-04-12 07:21:00 97
2022484405 2016-04-12 07:21:05 102
2022484402 2016-04-12 07:21:10 105
2022484407 2016-04-12 07:21:20 103
2022484408 2016-04-12 07:21:25 101
2022484408 2016-04-12 07:22:05 95
2022484410 2016-04-12 07:22:10 91
2022484408 2016-04-12 07:22:15 93
2022484421 2016-04-12 07:22:20 94
2022484408 2016-04-12 07:22:25 93
2022484421 2016-04-12 07:22:35 92
2022484421 2016-04-12 07:22:40 89
2022484447 2016-04-12 07:22:50 83
2022484488 2016-04-12 07:22:55 61
2022484411 2016-04-12 07:23:00 60
The class of "final_date_time" is POSIXct.
How can I find the average of "Speed" on a per minute basis?
Here is the dput info:
structure(list(Id = c(2022484408, 2022484408, 2022484408, 2022484408,
2022484408, 2022484408), new_final_date = structure(c(1460438460,
1460438465, 1460438470, 1460438480, 1460438485, 1460438525), class = c("POSIXct",
"POSIXt"), tzone = ""), Value = c(97L, 102L, 105L, 103L, 101L,
95L)), row.names = c(NA, 6L), class = "data.frame")
I tried a solution that worked on a previous similar problem, (changed the labels), but I couldn't get it to work:
library(xts)
x <- as.xts(dd[,date := as.POSIXct(date)])
period.apply(x, endpoints(x, "minutes", 15), mean)
CodePudding user response:
This becomes very easy if you switch to data.table
. To do so, we first correct your question by making the data readable. I use data.table
for that too, and added commas to your data which was actually not readable: four actual columns with three columns names:
> D <- fread(text="Id, final_date_time, Speed
2022484408, 2016-04-12 07:21:00, 97
2022484405, 2016-04-12 07:21:05, 102
2022484402, 2016-04-12 07:21:10, 105
2022484407, 2016-04-12 07:21:20, 103
2022484408, 2016-04-12 07:21:25, 101
2022484408, 2016-04-12 07:22:05, 95
2022484410, 2016-04-12 07:22:10, 91
2022484408, 2016-04-12 07:22:15, 93
2022484421, 2016-04-12 07:22:20, 94
2022484408, 2016-04-12 07:22:25, 93
2022484421, 2016-04-12 07:22:35, 92
2022484421, 2016-04-12 07:22:40, 89
2022484447, 2016-04-12 07:22:50, 83
2022484488, 2016-04-12 07:22:55, 61
2022484411, 2016-04-12 07:23:00, 60")
> D
Id final_date_time Speed
<int> <POSc> <int>
1: 2022484408 2016-04-12 07:21:00 97
2: 2022484405 2016-04-12 07:21:05 102
3: 2022484402 2016-04-12 07:21:10 105
4: 2022484407 2016-04-12 07:21:20 103
5: 2022484408 2016-04-12 07:21:25 101
6: 2022484408 2016-04-12 07:22:05 95
7: 2022484410 2016-04-12 07:22:10 91
8: 2022484408 2016-04-12 07:22:15 93
9: 2022484421 2016-04-12 07:22:20 94
10: 2022484408 2016-04-12 07:22:25 93
11: 2022484421 2016-04-12 07:22:35 92
12: 2022484421 2016-04-12 07:22:40 89
13: 2022484447 2016-04-12 07:22:50 83
14: 2022484488 2016-04-12 07:22:55 61
15: 2022484411 2016-04-12 07:23:00 60
>
Now that we have that out of the way, we round the Datetime to minutes (and ensure to returns a POSIXct in a single column), and then use data.table
to run a simple aggregation (here: mean()
) by rounded minutes.
> D[, min := as.POSIXct(round(final_date_time, "mins"))][, .(avgspeed=mean(Speed)), by=min]
min avgspeed
<POSc> <num>
1: 2016-04-12 07:21:00 101.6
2: 2016-04-12 07:22:00 93.2
3: 2016-04-12 07:23:00 77.0
>
The code is a one-liner, but it looks a like nicer here with either a line break, or a voluntary split into two lines:
> D[, min := as.POSIXct(round(final_date_time, "mins"))]
> D[, .(avgspeed=mean(Speed)), by=min]
There are base R idioms to do the same via ave
or aggregate
, as there are of course dplyr
calls. I happen to prefer data.table
.
CodePudding user response:
library(data.table)
setDT(x)
x[, .(avg = mean(Speed)), .(Id, interval = cut(final_date_time, breaks = "1 min"))]
# Id interval avg
# 1: 2022484408 2016-04-12 07:21:00 99.000
# 2: 2022484405 2016-04-12 07:21:00 102.000
# 3: 2022484402 2016-04-12 07:21:00 105.000
# 4: 2022484407 2016-04-12 07:21:00 103.000
# 5: 2022484408 2016-04-12 07:22:00 93.667
# 6: 2022484410 2016-04-12 07:22:00 91.000
# 7: 2022484421 2016-04-12 07:22:00 91.667
# 8: 2022484447 2016-04-12 07:22:00 83.000
# 9: 2022484488 2016-04-12 07:22:00 61.000
# 10: 2022484411 2016-04-12 07:23:00 60.000
*I assumed you want to group on Id as well, if not, remove Id
x[, .(avg = mean(Speed)), .(interval = cut(final_date_time, breaks = "1 min"))]