I am forever working with collaborators in SPSS and STata so clear variable labels are really important to communiate what has been done to any given variable and what it records.
How do I rename variables with their variable labels most efficiently in a tidyverse context. I can do this, but it seems very unwieldy.
var1<-rnorm(100)
var2<-rnorm(100)
var3<-rnorm(100)
group_var<-sample(c("A", "B"), size=100, replace=T)
other_var1<-rnorm(100)
other_var2<-rnorm(100)
df<-data.frame(var1, var2, var3, group_var, other_var1, other_var2)
library(labelled)
library(tidyverse)
df %>%
set_variable_labels(var1="Measure 1",
var2="Measure 2",
var3="Measure 3",
group_var="Grouping Variable")->df
#Store variable labels
df %>%
select(starts_with("var")) %>%
var_label() %>%
unlist()->variable_labels
variable_labels<-data.frame(name=names(variable_labels), labels=variable_labels)
df %>%
pivot_longer(var1:var3) %>%
left_join(., variable_labels, by="name")
Is there a way to make the rename_with
function work here?
This does not do it.
df %>%
rename_with(., function(x) var_label(x),.cols=var1:var3)
CodePudding user response:
We could use !!!
with rename
on a named list or vector created from variable_labels
dataset
library(dplyr)
library(tibble)
df <- df %>%
rename(!!! deframe(variable_labels[2:1]))
-Check the names
> names(df)
[1] "Measure 1" "Measure 2" "Measure 3" "group_var" "other_var1" "other_var2"
Or if we want to use rename_with
df <- df %>%
rename_with(~ variable_labels$labels,
.cols = variable_labels$name)
The reason var_label
is not working is because it is looking for the value of the columns and not the column names i.e. according to ?var_label
x - a vector or a data.frame
var_label("var1")
NULL
whereas
> var_label(df$var1)
[1] "Measure 1"
If we dig the function rename_with.data.frame
it would be more evident
getAnywhere('rename_with.data.frame')
function (.data, .fn, .cols = everything(), ...)
{
.fn <- as_function(.fn)
cols <- tidyselect::eval_select(enquo(.cols), .data)
names <- names(.data)
names[cols] <- .fn(names[cols], ...)
names <- vec_as_names(names, repair = "check_unique")
set_names(.data, names)
}
i.e. the .fn
or the lambda function is applied on the column names. Thus, when we use var_label
, it require data.frame or vector and it fails
-added print statements in a modified function
rename_with_mod <- function (.data, .fn, .cols = everything(), ...)
{
cols <- tidyselect::eval_select(enquo(.cols), .data)
print("cols")
print(cols)
names <- names(.data)
print("names")
print(names)
.fn <- rlang::as_function(.fn)
print(names[cols])
.fn(names[cols], ...)
}
-Testing
# lambda function to return the column name
> df %>%
rename_with_mod(~ .x, .cols=var1:var3)
[1] "cols"
var1 var2 var3
1 2 3
[1] "names"
[1] "var1" "var2" "var3" "group_var" "other_var1" "other_var2"
[1] "var1" "var2" "var3"
[1] "var1" "var2" "var3"
# lambda function where we apply the var_label - returns NULL
> df %>%
rename_with_mod(~ var_label(.x), .cols=var1:var3)
[1] "cols"
var1 var2 var3
1 2 3
[1] "names"
[1] "var1" "var2" "var3" "group_var" "other_var1" "other_var2"
[1] "var1" "var2" "var3"
NULL
CodePudding user response:
You could also use the attributes directly:
colnames(data) <- sapply(data, function(x) attr(x, "label"))
Or if you prefer var_label
and rename_with
(beware though that there is no datamasking available here, thus data
, not .data
):
data |>
rename_with(function(x) sapply(x, function(y) var_label(data[[y]])))
Example with labelled haven
iris
data:
library(haven)
> path <- system.file("examples", "iris.dta", package = "haven")
> data <- read_dta(path)
> data
# A tibble: 150 × 5
sepallength sepalwidth petallength petalwidth species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.10 3.5 1.40 0.200 setosa
2 4.90 3 1.40 0.200 setosa
3 4.70 3.20 1.30 0.200 setosa
4 4.60 3.10 1.5 0.200 setosa
5 5 3.60 1.40 0.200 setosa
6 5.40 3.90 1.70 0.400 setosa
7 4.60 3.40 1.40 0.300 setosa
8 5 3.40 1.5 0.200 setosa
9 4.40 2.90 1.40 0.200 setosa
10 4.90 3.10 1.5 0.100 setosa
# … with 140 more rows
> colnames(data) <- sapply(data, function(x) attr(x, "label"))
> data
# A tibble: 150 × 5
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <chr>
1 5.10 3.5 1.40 0.200 setosa
2 4.90 3 1.40 0.200 setosa
3 4.70 3.20 1.30 0.200 setosa
4 4.60 3.10 1.5 0.200 setosa
5 5 3.60 1.40 0.200 setosa
6 5.40 3.90 1.70 0.400 setosa
7 4.60 3.40 1.40 0.300 setosa
8 5 3.40 1.5 0.200 setosa
9 4.40 2.90 1.40 0.200 setosa
10 4.90 3.10 1.5 0.100 setosa
# … with 140 more rows
Consider using janitor::make_clean_names
afterwards to make life easier for yourself.