Before performing some statistical analysis I would like to add weights to my sample as a function of a variable (the population size for each areal unit) so that the higher the population size within each unit, the greater the weight it will get and the opposite. Do you have any suggestion on how to do this in R? Thanks in advance
CodePudding user response:
You can do this with weighted.mean()
, providing the weights as the second argument.
Here is a quick example, using population as weights.
dat <- data.frame(
country = c("UK", "US", "France", "Zimbabwe"),
pop = c(6.7e4, 3.31e8, 6.8e4, 1.5e4),
love_of_british_royal_family = c(5, 9, 2, 1)
)
mean(dat$love_of_british_royal_family) # 4.25
weighted.mean(
dat$love_of_british_royal_family,
w = dat$pop
) # 8.997391
CodePudding user response:
SamR's weighted.mean
requires a weight for each member of your vector. If you have a population vector with many members and want to weight by a catagories of population size, you could use the base R cut
function. Here is a toy example:
population <- sample(200:200000, 100)
df <- data.frame(population)
breaks <- c(200, 10000, 50000, 100000, 200000)
labels <- c(0.1, 0.2, 0.3, 0.4)
cuts <- cut(df$population, breaks = breaks, labels = labels)
df$weights <- as.numeric(as.character(cuts))
head(df)
population weights
1 25087 0.2
2 92652 0.3
3 99051 0.3
4 136376 0.4
5 184573 0.4
6 147675 0.4
Note that cuts
is a vector of factors. Therefore the as.character(cuts)
conversion is required to maintain the intended fractional weights.