Home > database >  How best to do row operations in R
How best to do row operations in R

Time:07-19

Below is the sample data

   year <- c (2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021,2016,2017,2018,2019,2020,2021)
  indcode <- c(71,71,71,71,71,71,72,72,72,72,72,72,44,44,44,44,44,44,45,45,45,45,45,45)
  avgemp <- c(44,46,48,50,55,56,10,11,12,13,14,15,21,22,22,23,25,25,61,62,62,63,69,77)
  ownership <-c(50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50,50)

  test3 <- data.frame (year,indcode,avgemp,ownership)

The desired result is to have where it sums the avgemp for two specific combinations (71 72 and 44 45) and produces one additional row per year. The items in the parentheses (below) are just there to illustrate which numbers get added. The primary source of my confusion is how to have it select and therefore add certain indcode combinations. My initial thought is that you would pivot wider, add the columns, and the pivot_longer but hoping for something a bit less convoluted.

  year     indcode        avgemp          ownership
  2016       71 72         54 (44 10)         50
  2016         71          44                 50
  2016         72          10
  2017       71 72         57 (46 11)         50
  2018       71 72         60 (48 12)         50
  2019       71 72         63 (50 13)         50
  2020       71 72         69 (55 14)         50
  2021       71 72         71 (56 15)         50

I know that it would start something like this

 test3 <- test3 %>% group_by (indcode) %>% mutate("71 72" = (something that filters out 71 and  72) 
  

CodePudding user response:

  group_by(year, gr = indcode %/) %>%
   summarise(indcode = paste(unique(indcode), collapse = ' '),
             avgemp = sum(avgemp), ownership = ownership[1], .groups = 'drop') %>%
   select(-gr)%>%
   arrange(indcode)

# A tibble: 12 x 4
    year indcode avgemp     ownership
   <dbl> <chr>    <dbl>          <dbl>
 1  2016 44 45       82             50
 2  2017 44 45       84             50
 3  2018 44 45       84             50
 4  2019 44 45       86             50
 5  2020 44 45       94             50
 6  2021 44 45      102             50
 7  2016 71 72       54             50
 8  2017 71 72       57             50
 9  2018 71 72       60             50
10  2019 71 72       63             50
11  2020 71 72       69             50
12  2021 71 72       71             50

CodePudding user response:

Using data.table - convert the data.frame to 'data.table' with setDT, grouped by 'year', 'ownership', and the 'indcode' created by an ifelse/fcase method), get the sum of 'avgemp' as a summarised output

library(data.table)
setDT(test3)[, .(avgemp = sum(avgemp)), .(year, ownership, 
  indcode = fcase(indcode %in% c(71, 72), '71 72', default = '44 45'))]

-output

   year ownership indcode avgemp
    <num>     <num>  <char>  <num>
 1:  2016        50   71 72     54
 2:  2017        50   71 72     57
 3:  2018        50   71 72     60
 4:  2019        50   71 72     63
 5:  2020        50   71 72     69
 6:  2021        50   71 72     71
 7:  2016        50   44 45     82
 8:  2017        50   44 45     84
 9:  2018        50   44 45     84
10:  2019        50   44 45     86
11:  2020        50   44 45     94
12:  2021        50   44 45    102
  • Related