Varying the weights variable, based on the grouping variable-CodePudding

I have example data as follows:

table_selection <- structure(list(year = c(2006, 2006, 2006, 2006, 2006), Totaal_pop_weights = c(12.125, 
12.125, 12.125, 12.125, 12.125), Y02_pop_weights = c(97, 97, 
97, 97, 97), Y01_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), h10_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), A_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), B_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125, 
12.125), C_ha_pop_weights = c(97, 97, 97, 97, 97), D_ha_pop_weights = c(12.125, 
12.125, 12.125, 12.125, 12.125), variable = structure(c(2L, 1L, 
1L, 4L, 1L), levels = c("A_ha", "B_ha", "C_ha", 
"C_ha", "Y01", "Y02", "Totaal", "X10"), class = "factor"), 
    value = c(2, 3, 1, 1, 12.9)), row.names = c(NA, -5L), class = c("data.table", 
"data.frame"))


   year Totaal_pop_weights Y02_pop_weights Y01_pop_weights h10_pop_weights A_ha_pop_weights B_ha_pop_weights
1: 2006             12.125              97          12.125          12.125           12.125           12.125
2: 2006             12.125              97          12.125          12.125           12.125           12.125
3: 2006             12.125              97          12.125          12.125           12.125           12.125
4: 2006             12.125              97          12.125          12.125           12.125           12.125
5: 2006             12.125              97          12.125          12.125           12.125           12.125
   C_ha_pop_weights D_ha_pop_weights variable value
1:               97           12.125     B_ha   2.0
2:               97           12.125     A_ha   3.0
3:               97           12.125     A_ha   1.0
4:               97           12.125     C_ha   1.0
5:               97           12.125     A_ha  12.9

I would like to weight the observations as follows:

weights_of_interest <- select(table_selection, contains(c("weights")))
table_selection <- table_selection %>%
    group_by(year, variable) %>%
    summarize(weighted_mean = weighted_mean(value, w = Y01_pop_weights , na.rm=TRUE),
              weighted_se = weighted_se(value, w = Y01_pop_weights , na.rm=TRUE))

But this uses the same weight all the time Y01_pop_weights. How do I change the weight so that the value where variable is A_ha uses A_ha_pop_weights as a weight.

CodePudding user response：

If you want a tidyverse solution, I think the way to go is to use tidyr to turn the data into long format. My computer dont know the functions 'weighed_mean' or 'weighed_se', so I am not 100% sure this would work.

library(magrittr)
table_selection %>% 
  tidyr::pivot_longer(cols = tidyselect::contains("weights"),
                      values_to = "pop_values",
                      names_to = "NAMES") %>% 
  dplyr::group_by(year, variable, NAMES) %>%
  dplyr::summarize(weighted_mean = weighted_mean(value, w = pop_values, na.rm=TRUE),
weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))

But using weighed.mean from the stats package ...

table_selection %>% 
  tidyr::pivot_longer(cols = tidyselect::contains("weights"),
                      values_to = "pop_values",
                      names_to = "NAMES") %>% 
  dplyr::group_by(year, variable, NAMES) %>%
  dplyr::summarize(weighted_mean = stats::weighted.mean(value, w = pop_values , na.rm=TRUE),
                   #weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))

returns:

# A tibble: 24 x 4
# Groups:   year, variable [3]
    year variable NAMES              weighted_mean
   <dbl> <fct>    <chr>                      <dbl>
 1  2006 A_ha     A_ha_pop_weights            5.63
 2  2006 A_ha     B_ha_pop_weights            5.63
 3  2006 A_ha     C_ha_pop_weights            5.63
 4  2006 A_ha     D_ha_pop_weights            5.63
 5  2006 A_ha     h10_pop_weights             5.63
 6  2006 A_ha     Totaal_pop_weights          5.63
 7  2006 A_ha     Y01_pop_weights             5.63
 8  2006 A_ha     Y02_pop_weights             5.63
 9  2006 B_ha     A_ha_pop_weights            2   
10  2006 B_ha     B_ha_pop_weights            2   
# ... with 14 more rows

CodePudding user response：

If table_selection is a data.table (as your example data suggests), you can create a new single column wt that holds the pop weight value according to the value in variable

table_selection[
  ,
  wt:=.SD[[paste0(variable,"_pop_weights")]][1],
  by = 1:nrow(table_selection),
  .SDcols = patterns("ha_pop_weights")
]

Here is the same approach using dplyr (rowwise() and cacross())

# helper function
f <- function(d,v) d[[paste0(v,"_pop_weights")]][1]

# vector of wt variable names
ha_wts = names(table_selection)[grepl("ha_pop_weights$", names(table_selection))]

# mutate the `wt` column
table_selection %>% 
  rowwise() %>% 
  mutate(wt = f(setNames(c_across(all_of(ha_wts)), ha_wts),variable))

Using either approach, you may then use w=wt in your call to summarize() above.