How to take the mean of two subsequent rows iteratively thereby reducing the number of rows?-CodePudding

I have a tibble like so:

library(dplyr)
set.seed(1)
my_tib <- tibble(identifier = rep(letters[1:3], each = 4),
       year = rep(seq(2005, 2020, 5), 3),
       value = rnorm(12, mean = 1000, 100) %>% round()
       )

my_tib

# A tibble: 12 × 3
   identifier  year value
   <chr>      <dbl> <dbl>
 1 a           2005   937
 2 a           2010  1018
 3 a           2015   916
 4 a           2020  1160
 5 b           2005  1033
 6 b           2010   918
 7 b           2015  1049
 8 b           2020  1074
 9 c           2005  1058
10 c           2010   969
11 c           2015  1151
12 c           2020  1039

Now I'd like to shrink down my tibble by taking the mean value for two years each, creating a new column for the year bracket. For example, I'd like to take the mean of 937 and 1018 (977.5) for the new year_bracket 2005-2010.

I'd like to repeat this for all years and all identifiers.

So the first new 5 rows of my tibble look like this:

head(my_new_tib, 5)

# A tibble: 9 × 3
   identifier  year_bracket  value       
   <chr>       <chr>         <dbl>
 1 a           2005-2010     977.5
 2 a           2010-2015     967
 3 a           2015-2020     1038
 4 b           2005-2010     975.5
 5 b           2010-2015     983.5

Ideally, I'm looking for a piped dplyr solution but I'm also curious regarding other solutions.

CodePudding user response：

Using dplyr:

library(dplyr)
my_tib |> 
  group_by(identifier) |> 
  mutate(value = (value   lag(value))/2,
         year_bracket = paste0(lag(year)," - ",year),
         .keep = "unused",
         .before = 2) |> 
  filter(!is.na(value)) |> 
  ungroup()

Output:

# A tibble: 9 x 3
  identifier year_bracket value
  <chr>      <chr>        <dbl>
1 a          2005 - 2010   978.
2 a          2010 - 2015   967 
3 a          2015 - 2020  1038 
4 b          2005 - 2010   976.
5 b          2010 - 2015   984.
6 b          2015 - 2020  1062.
7 c          2005 - 2010  1014.
8 c          2010 - 2015  1060 
9 c          2015 - 2020  1095

CodePudding user response：

Another possible solution:

library(tidyverse)

my_tib %>%
  group_by(identifier) %>% 
  slice(c(1, rep(2:(n()-1), each = 2) , n())) %>%
  group_by(identifier, aux = rep(1:n(), each=2, length.out = n())) %>% 
  summarise(year_bracket = str_c(year, collapse = "_"), value = mean(value), 
    .groups = "drop") %>% select(-aux)

#> # A tibble: 9 × 3
#>   identifier year_bracket value
#>   <chr>      <chr>        <dbl>
#> 1 a          2005_2010     978.
#> 2 a          2010_2015     967 
#> 3 a          2015_2020    1038 
#> 4 b          2005_2010     976.
#> 5 b          2010_2015     984.
#> 6 b          2015_2020    1062.
#> 7 c          2005_2010    1014.
#> 8 c          2010_2015    1060 
#> 9 c          2015_2020    1095