Is there a way to set variable labels of data frame using the map family?-CodePudding

I have a data set and accompanying datat dictionary. And I would like to use the data dictionary to set the variable labels of the dataset. I tried using the explicit for loop but it appears to be quite slow. Is there a way to use the map family from tidyverse to achieve the same goal?

library(tidyverse)

mydata <- tibble(
  a_1 = c(20,22, 13,14,44),
  a_2 = c(42, 13, 32, 31, 14),
  b = c("male", "female", "male", "female", "male"),
  c = c("Primary", "secondary", "Tertiary", "Primary", "Secondary")
)

dictionary <- tibble(
  variable = c("a", "b", "c"),
  label = c("Age", "Gender", "Education"),
  type = c("mselect", "select", "select")

)


variables <- names(mydata)


for (var in variables){

  vm <- unique(str_remove_all(var, "_.*")) # Take care of the variables with _

  varlbl <- filter(dictionary, variable == vm) %>%
    select(label) %>% pull


    attr(mydata[[var]], "label") <- varlbl
}


#---- Map the variable labels using map
#

CodePudding user response：

base R

mydata[] <- Map(
  function(x, lbl) if (!is.na(lbl)) `attr<-`(x, "label", lbl) else x, 
  mydata, dictionary$label[ match(gsub("_.*", "", names(mydata)),
  dictionary$variable) ])
str(mydata)
# tibble [5 x 4] (S3: tbl_df/tbl/data.frame)
#  $ a_1: num [1:5] 20 22 13 14 44
#   ..- attr(*, "label")= chr "Age"
#  $ a_2: num [1:5] 42 13 32 31 14
#   ..- attr(*, "label")= chr "Age"
#  $ b  : chr [1:5] "male" "female" "male" "female" ...
#   ..- attr(*, "label")= chr "Gender"
#  $ c  : chr [1:5] "Primary" "secondary" "Tertiary" "Primary" ...
#   ..- attr(*, "label")= chr "Education"

The mydata[] <- reassignment is intentional and a small hack: if we do mydata <- (no brackets), then Map returns a list and the "frame" properties are lost. However, mydata[] <- reassigns the contents (columns) with the new data, and the replacement comes as a list/frame, and the mydata frame-like properties are preserved.

I use this frequently when I want to (for example) convert a subset of columns to something else. I might do somedata[3:6] <- lapply(somedata[3:6], as.numeric), and I think it is much more readable than other methods to get the same effect.

purrr

library(dplyr) # just for %>% here
library(purrr)
mydata <- map2_dfc(
  mydata,
  dictionary$label[ match(gsub("_.*", "", names(mydata)), dictionary$variable) ],
  ~ `attr<-`(.x, "label", .y))

For both, I'm using a shortcut "cheat": these two are equivalent:

{
  attr(x, "label") <- "something"
  x
}

## is equivalent to

{
  `attr<-`(x, "label", "something")
}

in that they both return the updated x. It's a little code-golf, a little aesthetics (reduced requirement for semicolons and braces), but you can easily shift to the more traditional (first) method if you prefer.

CodePudding user response：

The labelVector package is another option that, while not as fast as Map, is a bit easier on the eyes (at least I think so):

library(labelVector)

idx <- match(gsub("_.*", "", names(mydata)), dictionary$variable)
var_label <- dictionary$label[idx]
names(var_label) <- names(mydata)

mydata <- set_label(mydata, .dots = var_label)