Home > Enterprise >  Concatenate binary column names with commas in dplyr
Concatenate binary column names with commas in dplyr

Time:09-28

I would like to create a new column describing which binary attributes are present in a given sample using the corresponding category names.

Here's a sample of my data

sample_id type_1 type_2 type_3
1 0 0 1
2 1 1 0
3 1 1 1

Ideally I'd like to create a type column that summarizes all the variables by their name, seperated by commas like this

type
type_3
type_1, type_2
type_1, type_2, type_3

I've tried doing that with mutate and if_else, by creating an additional three columns with string names or zero, but when it comes to concatenating the values I get multiple commas in sequence where there are zero values.

CodePudding user response:

library(dplyr)

df <- read.table(text = "sample_id type_1 type_2 type_3
1 0 0 1
2 1 1 0
3 1 1 1", header = TRUE)

df %>% 
  rowwise() %>% 
  mutate(
    type = toString(paste0("type ", which(cur_data()[-1L] == 1)))
  )
#   sample_id type_1 type_2 type_3 type                  
#       <int>  <int>  <int>  <int> <chr>                 
# 1         1      0      0      1 type 3                
# 2         2      1      1      0 type 1, type 2        
# 3         3      1      1      1 type 1, type 2, type 3

CodePudding user response:

Here's a base R option -

#get columns names that have type in it
cols <- grep('type', names(df), value = TRUE)
#get row/column number where 1 is present
mat <- which(df[cols] == 1, arr.ind = TRUE)
#For each row combine the column names
df$type <- tapply(mat[, 2], mat[, 1], function(x) toString(cols[x]))
df

#  sample_id type_1 type_2 type_3                   type
#1         1      0      0      1                 type_3
#2         2      1      1      0         type_1, type_2
#3         3      1      1      1 type_1, type_2, type_3
  • Related