Home > Enterprise >  Slide window mean
Slide window mean

Time:07-13

I'm trying to do a slide window mean in a dataframe with fixed number of elements. I'm usint tidyverse and the slider package. lets use this example

> example
# A tibble: 28 × 3
# Groups:   a [4]
   a          b order
   <chr>  <dbl> <dbl>
 1 a     0.199      1
 2 a     0.711      2
 3 a     0.506      3
 4 a     0.0233     4
 5 a     0.994      5
 6 a     0.0920     6
 7 a     0.0633     7
 8 b     0.208      1
 9 b     0.536      2
10 b     0.954      3
# … with 18 more rows

I want to walk in this dataframe collecting fixed 5 rows of a same group (column a), calculate the mean of the values in the column b, and get a final dataframe with the mean of each consecutive 5 rows. I'd like to do something less verbose than

example %>% 
  filter(a=='a') %>% 
  filter(order >= 1 & order <= 5) %>% 
  summarise(
    bmean = mean(b)
  ) %>% 
  bind_rows(
    example %>% 
      filter(a=='a') %>% 
      filter(order >= 2 & order <= 6) %>% 
      summarise(
        bmean = mean(b)
      ) 
  ) %>% 
  bind_rows(
    example %>% 
      filter(a=='a') %>% 
      filter(order >= 3 & order <= 7) %>% 
      summarise(
        bmean = mean(b)
      )
  )
....

In the end I'd like to have something like this

# A tibble: 3 × 2
  a     bmean
  <chr> <dbl>
1 a     0.487
2 a     0.465
3 a     0.336

dput of the example

structure(list(a = c("a", "a", "a", "a", "a", "a", "a", "b", 
"b", "b", "b", "b", "b", "b", "c", "c", "c", "c", "c", "c", "c", 
"d", "d", "d", "d", "d", "d", "d"), b = c(0.199394755531102, 
0.711478831479326, 0.505893802503124, 0.0233104680664837, 0.994404575554654, 
0.092042422387749, 0.0632751111406833, 0.208315970376134, 0.535682428395376, 
0.95367618277669, 0.960666978731751, 0.498914737952873, 0.930273110279813, 
0.729706238722429, 0.248024080879986, 0.411097032949328, 0.235628247493878, 
0.801356019219384, 0.921010897262022, 0.0789776453748345, 0.887602533679456, 
0.137188882334158, 0.454170317854732, 0.92619909462519, 0.556802915409207, 
0.396265675779432, 0.345948834205046, 0.904526130529121), order = c(1, 
2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 2, 3, 4, 5, 6, 7, 1, 
2, 3, 4, 5, 6, 7)), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -28L), groups = structure(list(
    a = c("a", "b", "c", "d"), .rows = structure(list(1:7, 8:14, 
        15:21, 22:28), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -4L), .drop = TRUE))

Thanks in advance

CodePudding user response:

As the data is already grouped, we don't need the group_by, just use slide on the 'b' column with .after = 4

library(dplyr)
library(slider)
example %>% 
 summarise(bmean = slide_dbl(b, .f = mean, .after = 4), .groups = 'drop')

-output

# A tibble: 28 × 2
   a      bmean
   <chr>  <dbl>
 1 a     0.487 
 2 a     0.465 
 3 a     0.336 
 4 a     0.293 
 5 a     0.383 
 6 a     0.0777
 7 a     0.0633
 8 b     0.631 
 9 b     0.776 
10 b     0.815 
# … with 18 more rows

If we need to get the mean only when there are 5 elements, specify .complete = TRUE to return as NA and later remove the NA elements (na.omit)

example %>% 
 summarise(bmean = slide_dbl(b, .f = mean, .after = 4,
    .complete = TRUE), .groups = 'drop') %>%
 na.omit()

-output

# A tibble: 12 × 2
   a     bmean
   <chr> <dbl>
 1 a     0.487
 2 a     0.465
 3 a     0.336
 4 b     0.631
 5 b     0.776
 6 b     0.815
 7 c     0.523
 8 c     0.490
 9 c     0.585
10 d     0.494
11 d     0.536
12 d     0.626

CodePudding user response:

I did not really understand what exactly is the groups of five you want to do. I see two possible answer:

df %>%
  filter(a=='a') %>% 
  mutate(ID=c(1:NROW(.)),
         grp=ID %/% 5 %>% 
            as.factor()) %>%
  group_by(grp) %>%
  summarise(bmeans=mean(b))

An other solution which get to your result is:

d2 <-  d %>% filter(a=='a')
map(c(1:length(d2)), function(x){
  if(x 4>NROW(d2)) k <- NROW(d2) else k <- x 4
  mean(d2$b[x:k])
  }) 
  • Related