I have a tibble like so:
library(dplyr)
set.seed(1)
my_tib <- tibble(identifier = rep(letters[1:3], each = 4),
year = rep(seq(2005, 2020, 5), 3),
value = rnorm(12, mean = 1000, 100) %>% round()
)
my_tib
# A tibble: 12 × 3
identifier year value
<chr> <dbl> <dbl>
1 a 2005 937
2 a 2010 1018
3 a 2015 916
4 a 2020 1160
5 b 2005 1033
6 b 2010 918
7 b 2015 1049
8 b 2020 1074
9 c 2005 1058
10 c 2010 969
11 c 2015 1151
12 c 2020 1039
Now I'd like to shrink down my tibble by taking the mean value
for two years each, creating a new column for the year bracket. For example, I'd like to take the mean of 937 and 1018 (977.5) for the new year_bracket
2005-2010.
I'd like to repeat this for all years and all identifiers.
So the first new 5 rows of my tibble look like this:
head(my_new_tib, 5)
# A tibble: 9 × 3
identifier year_bracket value
<chr> <chr> <dbl>
1 a 2005-2010 977.5
2 a 2010-2015 967
3 a 2015-2020 1038
4 b 2005-2010 975.5
5 b 2010-2015 983.5
Ideally, I'm looking for a piped dplyr solution but I'm also curious regarding other solutions.
CodePudding user response:
Using dplyr
:
library(dplyr)
my_tib |>
group_by(identifier) |>
mutate(value = (value lag(value))/2,
year_bracket = paste0(lag(year)," - ",year),
.keep = "unused",
.before = 2) |>
filter(!is.na(value)) |>
ungroup()
Output:
# A tibble: 9 x 3
identifier year_bracket value
<chr> <chr> <dbl>
1 a 2005 - 2010 978.
2 a 2010 - 2015 967
3 a 2015 - 2020 1038
4 b 2005 - 2010 976.
5 b 2010 - 2015 984.
6 b 2015 - 2020 1062.
7 c 2005 - 2010 1014.
8 c 2010 - 2015 1060
9 c 2015 - 2020 1095
CodePudding user response:
Another possible solution:
library(tidyverse)
my_tib %>%
group_by(identifier) %>%
slice(c(1, rep(2:(n()-1), each = 2) , n())) %>%
group_by(identifier, aux = rep(1:n(), each=2, length.out = n())) %>%
summarise(year_bracket = str_c(year, collapse = "_"), value = mean(value),
.groups = "drop") %>% select(-aux)
#> # A tibble: 9 × 3
#> identifier year_bracket value
#> <chr> <chr> <dbl>
#> 1 a 2005_2010 978.
#> 2 a 2010_2015 967
#> 3 a 2015_2020 1038
#> 4 b 2005_2010 976.
#> 5 b 2010_2015 984.
#> 6 b 2015_2020 1062.
#> 7 c 2005_2010 1014.
#> 8 c 2010_2015 1060
#> 9 c 2015_2020 1095