Below is the sample data
year <- c (2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021)
indcode <- c(71,71,71,71,71,71,72,72,72,72,72,72,44,44,44,44,44,44,45,45,45,45,45,45)
avgemp <- c(44,46,48,50,55,56,10,11,12,13,14,15,21,22,22,23,25,25,61,62,62,63,69,77)
ownership <-c(50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)
test3 <- data.frame (year,indcode,avgemp,ownership)
The desired result is to have where it sums the avgemp for two specific combinations (71 72 and 44 45) and produces one additional row per year. The items in the parentheses (below) are just there to illustrate which numbers get added. The primary source of my confusion is how to have it select and therefore add certain indcode combinations. My initial thought is that you would pivot wider, add the columns, and the pivot_longer but hoping for something a bit less convoluted.
year indcode avgemp ownership
2016 71 72 54 (44 10) 50
2016 71 44 50
2016 72 10
2017 71 72 57 (46 11) 50
2018 71 72 60 (48 12) 50
2019 71 72 63 (50 13) 50
2020 71 72 69 (55 14) 50
2021 71 72 71 (56 15) 50
I know that it would start something like this
test3 <- test3 %>% group_by (indcode) %>% mutate("71 72" = (something that filters out 71 and 72)
CodePudding user response:
group_by(year, gr = indcode %/) %>%
summarise(indcode = paste(unique(indcode), collapse = ' '),
avgemp = sum(avgemp), ownership = ownership[1], .groups = 'drop') %>%
select(-gr)%>%
arrange(indcode)
# A tibble: 12 x 4
year indcode avgemp ownership
<dbl> <chr> <dbl> <dbl>
1 2016 44 45 82 50
2 2017 44 45 84 50
3 2018 44 45 84 50
4 2019 44 45 86 50
5 2020 44 45 94 50
6 2021 44 45 102 50
7 2016 71 72 54 50
8 2017 71 72 57 50
9 2018 71 72 60 50
10 2019 71 72 63 50
11 2020 71 72 69 50
12 2021 71 72 71 50
CodePudding user response:
Using data.table
- convert the data.frame to 'data.table' with setDT
, grouped by 'year', 'ownership', and the 'indcode' created by an ifelse/fcase method), get the sum
of 'avgemp' as a summarised output
library(data.table)
setDT(test3)[, .(avgemp = sum(avgemp)), .(year, ownership,
indcode = fcase(indcode %in% c(71, 72), '71 72', default = '44 45'))]
-output
year ownership indcode avgemp
<num> <num> <char> <num>
1: 2016 50 71 72 54
2: 2017 50 71 72 57
3: 2018 50 71 72 60
4: 2019 50 71 72 63
5: 2020 50 71 72 69
6: 2021 50 71 72 71
7: 2016 50 44 45 82
8: 2017 50 44 45 84
9: 2018 50 44 45 84
10: 2019 50 44 45 86
11: 2020 50 44 45 94
12: 2021 50 44 45 102