How to assign weights to sample in R-CodePudding

Before performing some statistical analysis I would like to add weights to my sample as a function of a variable (the population size for each areal unit) so that the higher the population size within each unit, the greater the weight it will get and the opposite. Do you have any suggestion on how to do this in R? Thanks in advance

CodePudding user response：

You can do this with weighted.mean(), providing the weights as the second argument.

Here is a quick example, using population as weights.

dat <- data.frame(
    country = c("UK", "US", "France", "Zimbabwe"),
    pop = c(6.7e4, 3.31e8, 6.8e4, 1.5e4),
    love_of_british_royal_family = c(5, 9, 2, 1)
)

mean(dat$love_of_british_royal_family) # 4.25

weighted.mean(
    dat$love_of_british_royal_family, 
    w = dat$pop
) # 8.997391

CodePudding user response：

SamR's weighted.mean requires a weight for each member of your vector. If you have a population vector with many members and want to weight by a catagories of population size, you could use the base R cut function. Here is a toy example:

population <- sample(200:200000, 100)
df <- data.frame(population)
breaks <- c(200, 10000, 50000, 100000, 200000)
labels <- c(0.1, 0.2, 0.3, 0.4)
cuts <- cut(df$population, breaks = breaks, labels = labels)
df$weights <- as.numeric(as.character(cuts))
head(df)
  population weights
1      25087     0.2
2      92652     0.3
3      99051     0.3
4     136376     0.4
5     184573     0.4
6     147675     0.4

Note that cuts is a vector of factors. Therefore the as.character(cuts) conversion is required to maintain the intended fractional weights.