I am needing to create a column called "combinations" in my data table that holds all values of the columns from column 4 to the end of my data table. I will use this line of code for multiple data tables and the number of columns will vary from data table to data table, so I do not always know the index number of the last column. The start will always be column 4.
I know of functions that work just fine using multiple column names, but not using multiple column indices. Does anyone know how to do this?
Example of something that would work using column names and not column indices:
mycols<-c("apple", "orange", "banana")
data[, combinations:=paste(mycols, sep=", ")]
Example of something I've tried using column indices that does not work:
ncols<-ncol(data)
my_cols <- data[ , c(4:ncols)]
data[, combinations:=paste(mycols, sep=", ")]
Example data
id number day apple orange banana
1 35 2 red orange yellow
2 12 3 red NA yellow
3 47 5 NA orange yellow
The final result I'm trying to accomplish
id number day apple orange banana combinations
1 35 2 red orange yellow red, orange, yellow
2 12 3 red NA yellow red, NA, yellow
3 47 5 NA orange yellow NA, orange, yellow
CodePudding user response:
We may need do.call
library(data.table)
data[, combinations := do.call(paste, c(.SD, sep = ", ")), .SDcols = 4:ncols]
-output
> data
id number day apple orange banana combinations
1: 1 35 2 red orange yellow red, orange, yellow
2: 2 12 3 red <NA> yellow red, NA, yellow
3: 3 47 5 <NA> orange yellow NA, orange, yellow
Or use unite
, which can remove the NA
elements with na.rm = TRUE
library(dplyr)
library(tidyr)
data %>%
unite(combinations, all_of(4:ncols), sep = ", ", na.rm = TRUE, remove = FALSE)
-output
id number day combinations apple orange banana
1: 1 35 2 red, orange, yellow red orange yellow
2: 2 12 3 red, yellow red <NA> yellow
3: 3 47 5 orange, yellow <NA> orange yellow
data
data <- structure(list(id = 1:3, number = c(35L, 12L, 47L), day = c(2L,
3L, 5L), apple = c("red", "red", NA), orange = c("orange", NA,
"orange"), banana = c("yellow", "yellow", "yellow")),
class = "data.frame", row.names = c(NA,
-3L))
setDT(data)
CodePudding user response:
With dplyr
, using rowwise
library(dplyr)
df %>%
rowwise() %>%
mutate(combinations = list(c_across(4:ncol({{df}})))) %>%
data.frame()
id number day apple orange banana combinations
1 1 35 2 red orange yellow red, orange, yellow
2 2 12 3 red <NA> yellow red, NA, yellow
3 3 47 5 <NA> orange yellow NA, orange, yellow
Data
df <- structure(list(id = 1:3, number = c(35L, 12L, 47L), day = c(2L,
3L, 5L), apple = c("red", "red", NA), orange = c("orange", NA,
"orange"), banana = c("yellow", "yellow", "yellow")), class = "data.frame", row.names = c(NA,
-3L))