Home > Blockchain >  R: label variables based on their prefix
R: label variables based on their prefix

Time:07-26

Im new to R and trying to label multiple but not all variables of my data at the same time. Specifically, I want to label the variables starting with "pol". I tried to combine the select and the set_variable_labels command in the following manner:

cp14 <- cp14 %>% 
  select(matches("pol")) %>%
  set_variable_labels(cp14,
                      labels = "Interest in politics")

I would like all variables that include "pol" to be labelled as "Interest in politics". This however does not work. Any advice on how to do this in a similar or completely different manner is greatly appreciated. My data looks something like this, but with many more variables:

structure(list(pol_interest_w1 = c(0.5, 0.5, 0.25, 0.25, 0.25, 
0.5), pol_interest_w2 = c(0.5, 0.5, 0.25, NA, 0.25, 0.5), pol_interest_w3 = c(0.5, 
0.5, 0.25, NA, 0, 0.5), pol_interest_w4 = c(0.5, 0.5, 0.25, NA, 
0, 0.5), pol_interest_w5 = c(0.5, 0.5, 0.25, NA, 0, 0.5), pol_interest_w6 = c(0.5, 
0.5, 0.25, NA, 0, 0.5), pol_interest_w7 = c(0.5, 0.5, 0.25, NA, 
0.25, 0.5), new_col = c(0.75, 0.5, 0.25, NA, 0.25, 0.5)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response:

You can do this in a couple ways. For either solution, start by creating a vector of variable names starting with "pol". (I use stringr::str_starts() here; you don’t want to use select(), as in your code, which is for subsetting columns from your dataset.)

library(stringr)
library(labelled)

pol_vars <- names(cp14)[str_starts(names(cp14), "pol")]

Then, you can make a named list of labels, and pass it to the .labels argument of labelled::set_variable_labels().

pol_labels <- setNames(
    as.list(rep("Interest in politics", length(pol_vars))),
    pol_vars
)

cp14 <- set_variable_labels(cp14, .labels = pol_labels)

Alternatively, you could loop over the variable names and assign labels using labelled::var_label().

for (v in pol_vars) {
    var_label(cp14[[v]]) <- "Interest in politics"
}

Both approaches yield the same result:

#> var_label(cp14)

$pol_interest_w1
[1] "Interest in politics"

$pol_interest_w2
[1] "Interest in politics"

$pol_interest_w3
[1] "Interest in politics"

$pol_interest_w4
[1] "Interest in politics"

$pol_interest_w5
[1] "Interest in politics"

$pol_interest_w6
[1] "Interest in politics"

$pol_interest_w7
[1] "Interest in politics"

$new_col
NULL
  • Related