Home > OS >  rename variables with variable labels in R
rename variables with variable labels in R

Time:07-14

I am forever working with collaborators in SPSS and STata so clear variable labels are really important to communiate what has been done to any given variable and what it records.

How do I rename variables with their variable labels most efficiently in a tidyverse context. I can do this, but it seems very unwieldy.

var1<-rnorm(100)
var2<-rnorm(100)
var3<-rnorm(100)
group_var<-sample(c("A", "B"), size=100, replace=T)
other_var1<-rnorm(100)
other_var2<-rnorm(100)
df<-data.frame(var1, var2, var3, group_var, other_var1, other_var2)
library(labelled)
library(tidyverse)
df %>% 
  set_variable_labels(var1="Measure 1", 
                      var2="Measure 2",
                      var3="Measure 3",
                        group_var="Grouping Variable")->df


#Store variable labels
df %>% 
  select(starts_with("var")) %>% 
  var_label() %>% 
  unlist()->variable_labels
variable_labels<-data.frame(name=names(variable_labels), labels=variable_labels)
df %>% 
  pivot_longer(var1:var3) %>% 
  left_join(., variable_labels, by="name")
  

Is there a way to make the rename_with function work here? This does not do it.

df %>% 
  rename_with(., function(x) var_label(x),.cols=var1:var3)

CodePudding user response:

We could use !!! with rename on a named list or vector created from variable_labels dataset

library(dplyr)
library(tibble)
df <- df %>% 
   rename(!!! deframe(variable_labels[2:1]))

-Check the names

> names(df)
[1] "Measure 1"  "Measure 2"  "Measure 3"  "group_var"  "other_var1" "other_var2"

Or if we want to use rename_with

df <- df %>%
  rename_with(~ variable_labels$labels, 
      .cols = variable_labels$name)

The reason var_label is not working is because it is looking for the value of the columns and not the column names i.e. according to ?var_label

x - a vector or a data.frame

var_label("var1")
NULL

whereas

> var_label(df$var1)
[1] "Measure 1"

If we dig the function rename_with.data.frame it would be more evident

getAnywhere('rename_with.data.frame')
function (.data, .fn, .cols = everything(), ...) 
{
    .fn <- as_function(.fn)
    cols <- tidyselect::eval_select(enquo(.cols), .data)
    names <- names(.data)
    names[cols] <- .fn(names[cols], ...)
    names <- vec_as_names(names, repair = "check_unique")
    set_names(.data, names)
}

i.e. the .fn or the lambda function is applied on the column names. Thus, when we use var_label, it require data.frame or vector and it fails

-added print statements in a modified function

rename_with_mod <- function (.data, .fn, .cols = everything(), ...) 
{
   
    cols <- tidyselect::eval_select(enquo(.cols), .data)
    print("cols")
    print(cols)
    names <- names(.data)
    print("names")
    print(names)
    .fn <- rlang::as_function(.fn)
    print(names[cols])
    .fn(names[cols], ...)
    
}

-Testing

 # lambda function to return the column name
 > df %>% 
    rename_with_mod(~ .x, .cols=var1:var3)
[1] "cols"
var1 var2 var3 
   1    2    3 
[1] "names"
[1] "var1"       "var2"       "var3"       "group_var"  "other_var1" "other_var2"
[1] "var1" "var2" "var3"
[1] "var1" "var2" "var3"
# lambda function where we apply the var_label - returns NULL
> df %>% 
    rename_with_mod(~ var_label(.x), .cols=var1:var3)
[1] "cols"
var1 var2 var3 
   1    2    3 
[1] "names"
[1] "var1"       "var2"       "var3"       "group_var"  "other_var1" "other_var2"
[1] "var1" "var2" "var3"
NULL

CodePudding user response:

You could also use the attributes directly:

colnames(data) <- sapply(data, function(x) attr(x, "label"))

Or if you prefer var_label and rename_with (beware though that there is no datamasking available here, thus data, not .data):

data |> 
  rename_with(function(x) sapply(x, function(y) var_label(data[[y]])))

Example with labelled haven iris data:

library(haven)

> path <- system.file("examples", "iris.dta", package = "haven")
> data <- read_dta(path)
> data
# A tibble: 150 × 5
   sepallength sepalwidth petallength petalwidth species
         <dbl>      <dbl>       <dbl>      <dbl> <chr>  
 1        5.10       3.5         1.40      0.200 setosa 
 2        4.90       3           1.40      0.200 setosa 
 3        4.70       3.20        1.30      0.200 setosa 
 4        4.60       3.10        1.5       0.200 setosa 
 5        5          3.60        1.40      0.200 setosa 
 6        5.40       3.90        1.70      0.400 setosa 
 7        4.60       3.40        1.40      0.300 setosa 
 8        5          3.40        1.5       0.200 setosa 
 9        4.40       2.90        1.40      0.200 setosa 
10        4.90       3.10        1.5       0.100 setosa 
# … with 140 more rows
> colnames(data) <- sapply(data, function(x) attr(x, "label"))
> data
# A tibble: 150 × 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <chr>  
 1         5.10        3.5          1.40       0.200 setosa 
 2         4.90        3            1.40       0.200 setosa 
 3         4.70        3.20         1.30       0.200 setosa 
 4         4.60        3.10         1.5        0.200 setosa 
 5         5           3.60         1.40       0.200 setosa 
 6         5.40        3.90         1.70       0.400 setosa 
 7         4.60        3.40         1.40       0.300 setosa 
 8         5           3.40         1.5        0.200 setosa 
 9         4.40        2.90         1.40       0.200 setosa 
10         4.90        3.10         1.5        0.100 setosa 
# … with 140 more rows

Consider using janitor::make_clean_names afterwards to make life easier for yourself.

  • Related