Home > Software design >  R: Paste multiple column values together by index
R: Paste multiple column values together by index

Time:01-10

I am needing to create a column called "combinations" in my data table that holds all values of the columns from column 4 to the end of my data table. I will use this line of code for multiple data tables and the number of columns will vary from data table to data table, so I do not always know the index number of the last column. The start will always be column 4.

I know of functions that work just fine using multiple column names, but not using multiple column indices. Does anyone know how to do this?

Example of something that would work using column names and not column indices:

mycols<-c("apple", "orange", "banana")
data[, combinations:=paste(mycols, sep=", ")]

Example of something I've tried using column indices that does not work:

ncols<-ncol(data)
my_cols <- data[ , c(4:ncols)] 
data[, combinations:=paste(mycols, sep=", ")]

Example data

id  number  day apple  orange  banana  
1   35      2   red    orange  yellow
2   12      3   red    NA      yellow
3   47      5   NA     orange  yellow

The final result I'm trying to accomplish

id  number  day apple  orange  banana  combinations
1   35      2   red    orange  yellow  red, orange, yellow
2   12      3   red    NA      yellow  red, NA, yellow
3   47      5   NA     orange  yellow  NA, orange, yellow

CodePudding user response:

We may need do.call

library(data.table)
data[, combinations := do.call(paste, c(.SD, sep = ", ")), .SDcols = 4:ncols]

-output

> data
   id number day apple orange banana        combinations
1:  1     35   2   red orange yellow red, orange, yellow
2:  2     12   3   red   <NA> yellow     red, NA, yellow
3:  3     47   5  <NA> orange yellow  NA, orange, yellow

Or use unite, which can remove the NA elements with na.rm = TRUE

library(dplyr)
library(tidyr)
data %>% 
  unite(combinations, all_of(4:ncols), sep = ", ", na.rm = TRUE, remove = FALSE)

-output

    id number day        combinations apple orange banana
1:  1     35   2 red, orange, yellow   red orange yellow
2:  2     12   3         red, yellow   red   <NA> yellow
3:  3     47   5      orange, yellow  <NA> orange yellow

data

data <- structure(list(id = 1:3, number = c(35L, 12L, 47L), day = c(2L, 
3L, 5L), apple = c("red", "red", NA), orange = c("orange", NA, 
"orange"), banana = c("yellow", "yellow", "yellow")), 
class = "data.frame", row.names = c(NA, 
-3L))
setDT(data)

CodePudding user response:

With dplyr, using rowwise

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(combinations = list(c_across(4:ncol({{df}})))) %>% 
  data.frame()
  id number day apple orange banana        combinations
1  1     35   2   red orange yellow red, orange, yellow
2  2     12   3   red   <NA> yellow     red, NA, yellow
3  3     47   5  <NA> orange yellow  NA, orange, yellow

Data

df <- structure(list(id = 1:3, number = c(35L, 12L, 47L), day = c(2L, 
3L, 5L), apple = c("red", "red", NA), orange = c("orange", NA, 
"orange"), banana = c("yellow", "yellow", "yellow")), class = "data.frame", row.names = c(NA, 
-3L))
  • Related