I have a data set and accompanying datat dictionary. And I would like to use the data dictionary to set the variable labels of the dataset. I tried using the explicit for loop
but it appears to be quite slow. Is there a way to use the map
family from tidyverse to achieve the same goal?
library(tidyverse)
mydata <- tibble(
a_1 = c(20,22, 13,14,44),
a_2 = c(42, 13, 32, 31, 14),
b = c("male", "female", "male", "female", "male"),
c = c("Primary", "secondary", "Tertiary", "Primary", "Secondary")
)
dictionary <- tibble(
variable = c("a", "b", "c"),
label = c("Age", "Gender", "Education"),
type = c("mselect", "select", "select")
)
variables <- names(mydata)
for (var in variables){
vm <- unique(str_remove_all(var, "_.*")) # Take care of the variables with _
varlbl <- filter(dictionary, variable == vm) %>%
select(label) %>% pull
attr(mydata[[var]], "label") <- varlbl
}
#---- Map the variable labels using map
#
CodePudding user response:
base R
mydata[] <- Map(
function(x, lbl) if (!is.na(lbl)) `attr<-`(x, "label", lbl) else x,
mydata, dictionary$label[ match(gsub("_.*", "", names(mydata)),
dictionary$variable) ])
str(mydata)
# tibble [5 x 4] (S3: tbl_df/tbl/data.frame)
# $ a_1: num [1:5] 20 22 13 14 44
# ..- attr(*, "label")= chr "Age"
# $ a_2: num [1:5] 42 13 32 31 14
# ..- attr(*, "label")= chr "Age"
# $ b : chr [1:5] "male" "female" "male" "female" ...
# ..- attr(*, "label")= chr "Gender"
# $ c : chr [1:5] "Primary" "secondary" "Tertiary" "Primary" ...
# ..- attr(*, "label")= chr "Education"
The mydata[] <-
reassignment is intentional and a small hack: if we do mydata <-
(no brackets), then Map
returns a list
and the "frame" properties are lost. However, mydata[] <-
reassigns the contents (columns) with the new data, and the replacement comes as a list/frame, and the mydata
frame-like properties are preserved.
I use this frequently when I want to (for example) convert a subset of columns to something else. I might do somedata[3:6] <- lapply(somedata[3:6], as.numeric)
, and I think it is much more readable than other methods to get the same effect.
purrr
library(dplyr) # just for %>% here
library(purrr)
mydata <- map2_dfc(
mydata,
dictionary$label[ match(gsub("_.*", "", names(mydata)), dictionary$variable) ],
~ `attr<-`(.x, "label", .y))
For both, I'm using a shortcut "cheat": these two are equivalent:
{
attr(x, "label") <- "something"
x
}
## is equivalent to
{
`attr<-`(x, "label", "something")
}
in that they both return the updated x
. It's a little code-golf, a little aesthetics (reduced requirement for semicolons and braces), but you can easily shift to the more traditional (first) method if you prefer.
CodePudding user response:
The labelVector
package is another option that, while not as fast as Map
, is a bit easier on the eyes (at least I think so):
library(labelVector)
idx <- match(gsub("_.*", "", names(mydata)), dictionary$variable)
var_label <- dictionary$label[idx]
names(var_label) <- names(mydata)
mydata <- set_label(mydata, .dots = var_label)