I would like to calculate mean every 5 rows in my df. Here is my df :
Time | value |
---|---|
03/06/2021 06:15:00 | NA |
03/06/2021 06:16:00 | NA |
03/06/2021 06:17:00 | 20 |
03/06/2021 06:18:00 | 22 |
03/06/2021 06:19:00 | 25 |
03/06/2021 06:20:00 | NA |
03/06/2021 06:21:00 | 31 |
03/06/2021 06:22:00 | 23 |
03/06/2021 06:23:00 | 19 |
03/06/2021 06:24:00 | 25 |
03/06/2021 06:25:00 | 34 |
03/06/2021 06:26:00 | 42 |
03/06/2021 06:27:00 | NA |
03/06/2021 06:28:00 | 19 |
03/06/2021 06:29:00 | 17 |
03/06/2021 06:30:00 | 25 |
I already have a loop which goes well to calculate means for each 5 rows package. My problem is in my "mean function".
The problem is :
-if I put na.rm = FALSE, mean = NA as soon as there is a NA in a package of 5 values.
- if I put na.rm = TRUE in mean function, the result gives me averages that are shifted to take 5 values.
I would like the NA not to interfere with the average and that when there is a NA in a package of 5 values, the average is only done on 4 values.
How can I do this? Thanks for your help !
CodePudding user response:
You can solve your problem by introducing a dummy variable that groups your observarions in sets of five and then calculating the mean within group. Here's MWE, based in the tidyverse, that assumes your data is in a data.frame named df
.
library(tidyverse)
df %>%
mutate(Group= 1 floor((row_number()-1) / 5)) %>%
group_by(Group) %>%
summarise(Mean=mean(value, na.rm=TRUE), .groups="drop")
# A tibble: 4 × 2
Group Mean
<dbl> <dbl>
1 1 22.3
2 2 24.5
3 3 28
4 4 25
CodePudding user response:
A solution based on purrr::map_dfr
:
library(purrr)
df <- data.frame(
stringsAsFactors = FALSE,
time = c("03/06/2021 06:15:00","03/06/2021 06:16:00",
"03/06/2021 06:17:00",
"03/06/2021 06:18:00","03/06/2021 06:19:00",
"03/06/2021 06:20:00","03/06/2021 06:21:00",
"03/06/2021 06:22:00","03/06/2021 06:23:00",
"03/06/2021 06:24:00","03/06/2021 06:25:00",
"03/06/2021 06:26:00",
"03/06/2021 06:27:00","03/06/2021 06:28:00",
"03/06/2021 06:29:00","03/06/2021 06:30:00"),
value = c(NA,NA,20L,22L,
25L,NA,31L,23L,19L,25L,34L,42L,NA,19L,17L,
25L)
)
map_dfr(1:(nrow(df)-5),
~ data.frame(Group =.x, Mean = mean(df$value[.x:(.x 5)],na.rm=T)))
#> Group Mean
#> 1 1 22.33333
#> 2 2 24.50000
#> 3 3 24.20000
#> 4 4 24.00000
#> 5 5 24.60000
#> 6 6 26.40000
#> 7 7 29.00000
#> 8 8 28.60000
#> 9 9 27.80000
#> 10 10 27.40000
#> 11 11 27.40000
CodePudding user response:
If you want to take average of every 5 minutes you may use lubridate
's function floor_date
/ceiling_date
to round the time.
library(dplyr)
library(lubridate)
df %>%
mutate(time = mdy_hms(time),
time = floor_date(time, '5 mins')) %>%
group_by(time) %>%
summarise(value = mean(value, na.rm = TRUE))
# time value
# <dttm> <dbl>
#1 2021-03-06 06:15:00 22.3
#2 2021-03-06 06:20:00 24.5
#3 2021-03-06 06:25:00 28
#4 2021-03-06 06:30:00 25