Home > OS >  Avoid assigning label to dataframe columns one by one for large number of columns
Avoid assigning label to dataframe columns one by one for large number of columns

Time:08-17

This is a dataframe I want to label. The labels are going to come from a column in another dataframe.

  a b c
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4

  variable  label
1        a label1
2        b label2
3        c label3

These are my tries with either indivudual labeling ( which is not possible since I have many columns in my actual data) , as well as a loop and papeR package( which I strongly want to avoid because it works once and does not work another time- OR I am not applying it correctly)

library(papeR)
library(Hmisc)
df <- data.frame(variable = c("a", "b", "c"),
                 label = c("label1", "label2", "label3"))
data <- data.frame(a = 1:4, b = 1:4, c = 1:4)

#### the classic column labeling
#### but my actual dataset has many calumns
Hmisc::label(data$a) <- df[1,2]
Hmisc::label(data$b) <- df[2,2]
Hmisc::label(data$c) <- df[3,2]
data


##### I want to somehow achieve this using Hmisc preferably
for(i in 1:ncol(data)){
       
   Hmisc::label(data[i]) <- df[i,2]
}
data


#### papeR is acting. s I do not want to use it. once it works
#### once it does not
papeR::labels(data) <- df$label  # this makes data a ldf
data <- as.data.frame(data)
data

CodePudding user response:

There is no need for any loop. Just set label(data, self = FALSE) <- value.

label(data, self = FALSE) <- c("label1", "label2", "label3")
Check
label(data)
#        a        b        c 
# "label1" "label2" "label3"

str(data)
# 'data.frame': 4 obs. of  3 variables:
#  $ a: 'labelled' int  1 2 3 4
#   ..- attr(*, "label")= chr "label1"
#  $ b: 'labelled' int  1 2 3 4
#   ..- attr(*, "label")= chr "label2"
#  $ c: 'labelled' int  1 2 3 4
#   ..- attr(*, "label")= chr "label3"

The key is to set self = FALSE; otherwise label(data) <- value will set the label of the data itself, not labels of its columns.

CodePudding user response:

I think you were already close to your desired solution, you only had to change data[i] for data[[i]] in the for-loop.

library(Hmisc)
#> Loading required package: lattice
#> Loading required package: survival
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
df <- data.frame(variable = c("a", "b", "c"),
                 label = c("label1", "label2", "label3"))
data <- data.frame(a = 1:4, b = 1:4, c = 1:4)

# You only had to change data[i] for data[[i]]
for (i in 1:ncol(data)) {
  Hmisc::label(data[[i]]) <- df[i, 2]
}

str(data)
#> 'data.frame':    4 obs. of  3 variables:
#>  $ a: 'labelled' int  1 2 3 4
#>   ..- attr(*, "label")= chr "label1"
#>  $ b: 'labelled' int  1 2 3 4
#>   ..- attr(*, "label")= chr "label2"
#>  $ c: 'labelled' int  1 2 3 4
#>   ..- attr(*, "label")= chr "label3"

CodePudding user response:

I really like using the {labelled} package, it provides set_variable_labels() that can take a list of old/new name pairs and has a .strict argument to decide whether or not all old names have to be present in the data set. Very convenient, since also pipeable.

library(labelled)
labs <- list("a" = "label1", "b" = "label2", c = "label3", 'not' = 'there')
data <- data.frame(a = 1:4, b = 1:4, c = 1:4)

data_labelled <- set_variable_labels(data, .labels = labs, .strict = FALSE)
unlist(var_label(data_labelled))
#>        a        b        c 
#> "label1" "label2" "label3"

Created on 2022-08-17 by the reprex package (v2.0.1)

In case that helps, here's how to convert your label data frame to the desired list object using deframe() and as.list():

library(tibble)
data.frame(
  variable = c("a", "b", "c"),
  label = c("label1", "label2", "label3")
) %>%  
  deframe() %>%  
  as.list()
  • Related