Home > OS >  R - refer to column names rather than column index when using lapply with data frame
R - refer to column names rather than column index when using lapply with data frame

Time:07-19

I am using lapply to take values from specific columns of a data frame and change them from a 1-5 scale to the inverse (i.e., 1 becomes 5, 2 becomes 4). I have managed to do this by referring to the column index:

df_vars[,c(104:183, 222:249, 271:290)] <- lapply(df_vars[,c(104:183, 222:249, 271:290)],
                                                 FUN = function(x) misty::item.reverse(x, min = 1, max = 5))

I want to be able to do the same thing but using column names instead. I cannot do this by just referring to all numeric columns or columns with ranges from 1 to 5, as not all of the columns with 1-5 scales need inverting. I also may need to drop columns and then rerun this code, so I would like to refer to column names instead.

I have tried using grep to get column indexes using the following code:

Using some example data:

# create example data frame
df <- data.frame("A" = c(1, 3, 5),
                 "B" = c(1, 2, 3),
                 "C" = c(4, 2, 1),
                 "D" = c(3, 2, 5),
                 "E" = c(5, 5, 4),
                 "F" = c(1, 2, 1),
                 "G" = c(3, 4, 3),
                 "H" = c(4, 3, 2))

# for this example, only A B D F G H need to be inverted

This is a small data frame, but my data frame is much larger with over 100 columns to invert, so pretend the example data set is too big to realistically work with one column at a time.

Using the example data and specified columns to invert, the desired output would be the following data frame:

# transformed data frame
df <- data.frame("A" = c(5, 3, 1),
                 "B" = c(5, 4, 3),
                 "C" = c(4, 2, 1),
                 "D" = c(3, 4, 1),
                 "E" = c(5, 5, 4),
                 "F" = c(5, 4, 5),
                 "G" = c(3, 2, 3),
                 "H" = c(2, 3, 4))

I tried using grep to get the column index using the column names. Based on the example data, the code I tried was:

df[, colnames(select(df, "A":"B", "D", "F":"H"))] <- lapply(grep(colnames(select(df, c("A":"B", "D", "F":"H"))), df),
                                                            FUN = function(x) misty::item.reverse(x, min = 1, max = 5))

This did not work. Testing the grep function on its own gave this:

> grep(colnames(select(df, c("A":"B", "D", "F":"H"))), df)
integer(0)
Warning message:
In grep(colnames(select(df, c("A":"B", "D", "F":"H"))), df) :
  argument 'pattern' has length > 1 and only the first element will be used
> 

Any ideas? Thank you.

CodePudding user response:

A possible solution, based on dplyr:

library(dplyr)

df %>% 
  mutate(across(A:H, ~ (5:1)[.x]))

#>   A B C D E F G H
#> 1 5 5 2 3 1 5 3 2
#> 2 3 4 4 4 1 4 2 3
#> 3 1 3 5 1 2 5 3 4

CodePudding user response:

You can use sapply() as follows. The problem in this example is that you cannot set ranges of columns by name easily.

cols <- c("A", "B", "D", "F", "G", "H")

df[,cols] <- sapply(df[,cols], \(x) (5:1)[x])

The easiest way to select by a range of columns is to use eval_select() to return their positions by number. But if you do this, you might as well just use the dplyr solution. This is essentially an under the hood look at it.

library(tidyselect)

col_pos <- eval_select(expr(c(A:B, D, F:H)), df)

df[,col_pos] <- sapply(df[,col_pos], \(x) (5:1)[x])
  • Related