I have example data as follows:
table_selection <- structure(list(year = c(2006, 2006, 2006, 2006, 2006), Totaal_pop_weights = c(12.125,
12.125, 12.125, 12.125, 12.125), Y02_pop_weights = c(97, 97,
97, 97, 97), Y01_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), h10_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), A_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), B_ha_pop_weights = c(12.125, 12.125, 12.125, 12.125,
12.125), C_ha_pop_weights = c(97, 97, 97, 97, 97), D_ha_pop_weights = c(12.125,
12.125, 12.125, 12.125, 12.125), variable = structure(c(2L, 1L,
1L, 4L, 1L), levels = c("A_ha", "B_ha", "C_ha",
"C_ha", "Y01", "Y02", "Totaal", "X10"), class = "factor"),
value = c(2, 3, 1, 1, 12.9)), row.names = c(NA, -5L), class = c("data.table",
"data.frame"))
year Totaal_pop_weights Y02_pop_weights Y01_pop_weights h10_pop_weights A_ha_pop_weights B_ha_pop_weights
1: 2006 12.125 97 12.125 12.125 12.125 12.125
2: 2006 12.125 97 12.125 12.125 12.125 12.125
3: 2006 12.125 97 12.125 12.125 12.125 12.125
4: 2006 12.125 97 12.125 12.125 12.125 12.125
5: 2006 12.125 97 12.125 12.125 12.125 12.125
C_ha_pop_weights D_ha_pop_weights variable value
1: 97 12.125 B_ha 2.0
2: 97 12.125 A_ha 3.0
3: 97 12.125 A_ha 1.0
4: 97 12.125 C_ha 1.0
5: 97 12.125 A_ha 12.9
I would like to weight the observations as follows:
weights_of_interest <- select(table_selection, contains(c("weights")))
table_selection <- table_selection %>%
group_by(year, variable) %>%
summarize(weighted_mean = weighted_mean(value, w = Y01_pop_weights , na.rm=TRUE),
weighted_se = weighted_se(value, w = Y01_pop_weights , na.rm=TRUE))
But this uses the same weight all the time Y01_pop_weights
. How do I change the weight so that the value where variable is A_ha
uses A_ha_pop_weights
as a weight.
CodePudding user response:
If you want a tidyverse solution, I think the way to go is to use tidyr to turn the data into long format. My computer dont know the functions 'weighed_mean' or 'weighed_se', so I am not 100% sure this would work.
library(magrittr)
table_selection %>%
tidyr::pivot_longer(cols = tidyselect::contains("weights"),
values_to = "pop_values",
names_to = "NAMES") %>%
dplyr::group_by(year, variable, NAMES) %>%
dplyr::summarize(weighted_mean = weighted_mean(value, w = pop_values, na.rm=TRUE),
weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))
But using weighed.mean from the stats package ...
table_selection %>%
tidyr::pivot_longer(cols = tidyselect::contains("weights"),
values_to = "pop_values",
names_to = "NAMES") %>%
dplyr::group_by(year, variable, NAMES) %>%
dplyr::summarize(weighted_mean = stats::weighted.mean(value, w = pop_values , na.rm=TRUE),
#weighted_se = weighted_se(value, w = pop_values , na.rm=TRUE))
returns:
# A tibble: 24 x 4
# Groups: year, variable [3]
year variable NAMES weighted_mean
<dbl> <fct> <chr> <dbl>
1 2006 A_ha A_ha_pop_weights 5.63
2 2006 A_ha B_ha_pop_weights 5.63
3 2006 A_ha C_ha_pop_weights 5.63
4 2006 A_ha D_ha_pop_weights 5.63
5 2006 A_ha h10_pop_weights 5.63
6 2006 A_ha Totaal_pop_weights 5.63
7 2006 A_ha Y01_pop_weights 5.63
8 2006 A_ha Y02_pop_weights 5.63
9 2006 B_ha A_ha_pop_weights 2
10 2006 B_ha B_ha_pop_weights 2
# ... with 14 more rows
CodePudding user response:
If table_selection
is a data.table (as your example data suggests), you can create a new single column wt
that holds the pop weight value according to the value in variable
table_selection[
,
wt:=.SD[[paste0(variable,"_pop_weights")]][1],
by = 1:nrow(table_selection),
.SDcols = patterns("ha_pop_weights")
]
Here is the same approach using dplyr
(rowwise()
and cacross()
)
# helper function
f <- function(d,v) d[[paste0(v,"_pop_weights")]][1]
# vector of wt variable names
ha_wts = names(table_selection)[grepl("ha_pop_weights$", names(table_selection))]
# mutate the `wt` column
table_selection %>%
rowwise() %>%
mutate(wt = f(setNames(c_across(all_of(ha_wts)), ha_wts),variable))
Using either approach, you may then use w=wt
in your call to summarize()
above.