I would like to create a new column describing which binary attributes are present in a given sample using the corresponding category names.
Here's a sample of my data
sample_id type_1 type_2 type_3
1 0 0 1
2 1 1 0
3 1 1 1
Ideally I'd like to create a type
column that summarizes all the variables by their name, seperated by commas like this
type
type_3
type_1, type_2
type_1, type_2, type_3
I've tried doing that with mutate
and if_else
, by creating an additional three columns with string names or zero, but when it comes to concatenating the values I get multiple commas in sequence where there are zero values.
CodePudding user response:
library(dplyr)
df <- read.table(text = "sample_id type_1 type_2 type_3
1 0 0 1
2 1 1 0
3 1 1 1", header = TRUE)
df %>%
rowwise() %>%
mutate(
type = toString(paste0("type ", which(cur_data()[-1L] == 1)))
)
# sample_id type_1 type_2 type_3 type
# <int> <int> <int> <int> <chr>
# 1 1 0 0 1 type 3
# 2 2 1 1 0 type 1, type 2
# 3 3 1 1 1 type 1, type 2, type 3
CodePudding user response:
Here's a base R option -
#get columns names that have type in it
cols <- grep('type', names(df), value = TRUE)
#get row/column number where 1 is present
mat <- which(df[cols] == 1, arr.ind = TRUE)
#For each row combine the column names
df$type <- tapply(mat[, 2], mat[, 1], function(x) toString(cols[x]))
df
# sample_id type_1 type_2 type_3 type
#1 1 0 0 1 type_3
#2 2 1 1 0 type_1, type_2
#3 3 1 1 1 type_1, type_2, type_3