R - Group based on time steps and average all values included in them-CodePudding

I have tables like the one below.

df <- data.frame(Metric = c("Stat1", "Stat1", "Stat1", "Stat1",
                            "Stat2", "Stat2", "Stat2",
                            "Stat3", "Stat3", "Stat3", "Stat3"),
                 Timestamp = c("0.000514", "0.060709", "0.091062", "0.134333",
                               "0.000382", "0.060018", "0.133970",
                               "0.007462", "0.078792", "0.115623", "0.148771"),
                 Value = c("10", "20", "25", "30",
                           "11", "21", "31",
                           "12", "22", "32", "37"))

There are multiple metrics and timestamps. Naturally for each timestamp there is a value for that metric. The problem is that between metrics the timestamps are completely different. It would be easier for me to work with the data if I have the same timestamps. I found that the average timestamp is 0.05533.

This is the result I'm trying to get.

df_new <- data.frame(Metric = c("Stat1", "Stat1", "Stat1",
                                "Stat2", "Stat2", "Stat2",
                                "Stat3", "Stat3", "Stat3"),
                     Timestamp = c("0.00", "0.05", "0.10",
                                   "0.00", "0.05", "0.10",
                                   "0.00", "0.05", "0.10"),
                     Value = c("10", "22.5", "30",
                               "11", "21", "31",
                               "12", "22", "34.5"))

I want to have the same timestamps for every metric. Starting with time 0, I want to average all values of a metric that are in the [0s,0.5s] range (then [0.5, 1.0s] and so on).

One issue is that there might be only one value for a metric in that range or multiple. It's not set.

How can I create the second table from the first?

CodePudding user response：

In your case you can use aggregate and floor to round the timestamp to your desired values

Observe that the columns in your example are character strings thus i explicitly must convert it with as.numeric.

df <- data.frame(Metric = c("Stat1", "Stat1", "Stat1", "Stat1",
                            "Stat2", "Stat2", "Stat2",
                            "Stat3", "Stat3", "Stat3", "Stat3"),
                 Timestamp = c("0.000514", "0.060709", "0.091062", "0.134333",
                               "0.000382", "0.060018", "0.133970",
                               "0.007462", "0.078792", "0.115623", "0.148771"),
                 Value = c("10", "20", "25", "30",
                           "11", "21", "31",
                           "12", "22", "32", "37"))

df2 <- aggregate(. ~ I(floor(as.numeric(Timestamp)/0.05)*0.05)   Metric, 
                 df, function(x) mean(as.numeric(x)))
df2
#>   I(floor(as.numeric(Timestamp)/0.05) * 0.05) Metric Timestamp Value
#> 1                                           0  Stat1 0.0005140  10.0
#> 2                                        0.05  Stat1 0.0758855  22.5
#> 3                                         0.1  Stat1 0.1343330  30.0
#> 4                                           0  Stat2 0.0003820  11.0
#> 5                                        0.05  Stat2 0.0600180  21.0
#> 6                                         0.1  Stat2 0.1339700  31.0
#> 7                                           0  Stat3 0.0074620  12.0
#> 8                                        0.05  Stat3 0.0787920  22.0
#> 9                                         0.1  Stat3 0.1321970  34.5

edit

If you want to improve legibility:


df$Timestamp  <- as.numeric(df$Timestamp)
df$Value      <- as.numeric(df$Value)
df$Time_range <- floor(df$Timestamp/0.05)*0.05

df2 <- aggregate(. ~ Time_range   Metric, df, mean)

df2
#>   Time_range Metric Timestamp Value
#> 1       0.00  Stat1 0.0005140  10.0
#> 2       0.05  Stat1 0.0758855  22.5
#> 3       0.10  Stat1 0.1343330  30.0
#> 4       0.00  Stat2 0.0003820  11.0
#> 5       0.05  Stat2 0.0600180  21.0
#> 6       0.10  Stat2 0.1339700  31.0
#> 7       0.00  Stat3 0.0074620  12.0
#> 8       0.05  Stat3 0.0787920  22.0
#> 9       0.10  Stat3 0.1321970  34.5

^{Created on 2022-11-04 with reprex v2.0.2}

CodePudding user response：

library(tidyverse)

v <- c(0,0.05, 0.1)

df %>%
  type_convert()%>%
  group_by(Metric, Timestamp = v[findInterval(Timestamp,v)])%>%
  summarise(Value = mean(Value))

# A tibble: 9 × 3
# Groups:   Metric [3]
  Metric Timestamp Value
  <chr>      <dbl> <dbl>
1 Stat1       0     10  
2 Stat1       0.05  22.5
3 Stat1       0.1   30  
4 Stat2       0     11  
5 Stat2       0.05  21  
6 Stat2       0.1   31  
7 Stat3       0     12  
8 Stat3       0.05  22  
9 Stat3       0.1   34.5