I am working with a large dataset. This dataset has a column with a reference code and other columns whose values are TRUE or FALSE. I'm trying to create a new column that collects concatenated (or as a list) the names of the columns whose value is equal to TRUE. A simplification of the dataframe would be something like:
data <- data.frame( Reference = c("001", "002", "003", "004", "005"),
Column A = c(TRUE, TRUE, FALSE, TRUE, FALSE),
Column B = c(FALSE, TRUE, TRUE, FALSE, FALSE),
Column C = c(TRUE, FALSE, TRUE, FALSE, TRUE))
data
The expected result would be something like:
Reference | Column A | Column B | Column C | Column D |
---|---|---|---|---|
0001 | TRUE | FALSE | TRUE | Column A, Column C |
0002 | TRUE | TRUE | FALSE | Column A, Column B |
0003 | FALSE | TRUE | TRUE | Column B, Column C |
0004 | TRUE | FALSE | FALSE | Column A |
0005 | FALSE | FALSE | TRUE | Column C |
I know how to create fields by concatenating values or column names but I don't know how to introduce the conditional so that it only takes the names of the columns whose value is TRUE.
Thank you so much in advance!!
CodePudding user response:
A tidyverse
solution with tidyr::unite()
:
library(tidyverse)
data %>%
mutate(unite(across(starts_with('Column'), ~ ifelse(.x, cur_column(), NA)),
col = 'Column_D', sep = ', ', na.rm = TRUE))
# Reference Column_A Column_B Column_C Column_D
# 1 001 TRUE FALSE TRUE Column_A, Column_C
# 2 002 TRUE TRUE FALSE Column_A, Column_B
# 3 003 FALSE TRUE TRUE Column_B, Column_C
# 4 004 TRUE FALSE FALSE Column_A
# 5 005 FALSE FALSE TRUE Column_C
CodePudding user response:
Base R solution
data <- data.frame( Reference = c("001", "002", "003", "004", "005"),
Column_A = c(TRUE, TRUE, FALSE, TRUE, FALSE),
Column_B = c(FALSE, TRUE, TRUE, FALSE, FALSE),
Column_C = c(TRUE, FALSE, TRUE, FALSE, TRUE))
L <- apply(data, 1, function(i) which(i == TRUE))
data$Column_D <- lapply(L, function(x) paste0(names(x), collapse = ", "))
# Reference Column_A Column_B Column_C Column_D
# 1 001 TRUE FALSE TRUE Column_A, Column_C
# 2 002 TRUE TRUE FALSE Column_A, Column_B
# 3 003 FALSE TRUE TRUE Column_B, Column_C
# 4 004 TRUE FALSE FALSE Column_A
# 5 005 FALSE FALSE TRUE Column_C
CodePudding user response:
Another dplyr
option:
data %>%
rowwise() %>%
mutate(Column_D = toString(names(across(starts_with("Column")))[which(c_across(starts_with("Column")))]))
Reference Column_A Column_B Column_C Column_D
<chr> <lgl> <lgl> <lgl> <chr>
1 001 TRUE FALSE TRUE Column_A, Column_C
2 002 TRUE TRUE FALSE Column_A, Column_B
3 003 FALSE TRUE TRUE Column_B, Column_C
4 004 TRUE FALSE FALSE Column_A
5 005 FALSE FALSE TRUE Column_C