I have tables like the one below.
df <- data.frame(Metric = c("Stat1", "Stat1", "Stat1", "Stat1",
"Stat2", "Stat2", "Stat2",
"Stat3", "Stat3", "Stat3", "Stat3"),
Timestamp = c("0.000514", "0.060709", "0.091062", "0.134333",
"0.000382", "0.060018", "0.133970",
"0.007462", "0.078792", "0.115623", "0.148771"),
Value = c("10", "20", "25", "30",
"11", "21", "31",
"12", "22", "32", "37"))
There are multiple metrics and timestamps. Naturally for each timestamp there is a value for that metric. The problem is that between metrics the timestamps are completely different. It would be easier for me to work with the data if I have the same timestamps. I found that the average timestamp is 0.05533.
This is the result I'm trying to get.
df_new <- data.frame(Metric = c("Stat1", "Stat1", "Stat1",
"Stat2", "Stat2", "Stat2",
"Stat3", "Stat3", "Stat3"),
Timestamp = c("0.00", "0.05", "0.10",
"0.00", "0.05", "0.10",
"0.00", "0.05", "0.10"),
Value = c("10", "22.5", "30",
"11", "21", "31",
"12", "22", "34.5"))
I want to have the same timestamps for every metric. Starting with time 0, I want to average all values of a metric that are in the [0s,0.5s] range (then [0.5, 1.0s] and so on).
One issue is that there might be only one value for a metric in that range or multiple. It's not set.
How can I create the second table from the first?
CodePudding user response:
In your case you can use aggregate
and floor
to round the timestamp to your desired values
Observe that the columns in your example are character strings thus i explicitly must convert it with as.numeric.
df <- data.frame(Metric = c("Stat1", "Stat1", "Stat1", "Stat1",
"Stat2", "Stat2", "Stat2",
"Stat3", "Stat3", "Stat3", "Stat3"),
Timestamp = c("0.000514", "0.060709", "0.091062", "0.134333",
"0.000382", "0.060018", "0.133970",
"0.007462", "0.078792", "0.115623", "0.148771"),
Value = c("10", "20", "25", "30",
"11", "21", "31",
"12", "22", "32", "37"))
df2 <- aggregate(. ~ I(floor(as.numeric(Timestamp)/0.05)*0.05) Metric,
df, function(x) mean(as.numeric(x)))
df2
#> I(floor(as.numeric(Timestamp)/0.05) * 0.05) Metric Timestamp Value
#> 1 0 Stat1 0.0005140 10.0
#> 2 0.05 Stat1 0.0758855 22.5
#> 3 0.1 Stat1 0.1343330 30.0
#> 4 0 Stat2 0.0003820 11.0
#> 5 0.05 Stat2 0.0600180 21.0
#> 6 0.1 Stat2 0.1339700 31.0
#> 7 0 Stat3 0.0074620 12.0
#> 8 0.05 Stat3 0.0787920 22.0
#> 9 0.1 Stat3 0.1321970 34.5
edit
If you want to improve legibility:
df$Timestamp <- as.numeric(df$Timestamp)
df$Value <- as.numeric(df$Value)
df$Time_range <- floor(df$Timestamp/0.05)*0.05
df2 <- aggregate(. ~ Time_range Metric, df, mean)
df2
#> Time_range Metric Timestamp Value
#> 1 0.00 Stat1 0.0005140 10.0
#> 2 0.05 Stat1 0.0758855 22.5
#> 3 0.10 Stat1 0.1343330 30.0
#> 4 0.00 Stat2 0.0003820 11.0
#> 5 0.05 Stat2 0.0600180 21.0
#> 6 0.10 Stat2 0.1339700 31.0
#> 7 0.00 Stat3 0.0074620 12.0
#> 8 0.05 Stat3 0.0787920 22.0
#> 9 0.10 Stat3 0.1321970 34.5
Created on 2022-11-04 with reprex v2.0.2
CodePudding user response:
library(tidyverse)
v <- c(0,0.05, 0.1)
df %>%
type_convert()%>%
group_by(Metric, Timestamp = v[findInterval(Timestamp,v)])%>%
summarise(Value = mean(Value))
# A tibble: 9 × 3
# Groups: Metric [3]
Metric Timestamp Value
<chr> <dbl> <dbl>
1 Stat1 0 10
2 Stat1 0.05 22.5
3 Stat1 0.1 30
4 Stat2 0 11
5 Stat2 0.05 21
6 Stat2 0.1 31
7 Stat3 0 12
8 Stat3 0.05 22
9 Stat3 0.1 34.5