Home > Software design >  Create new data frame column based on conditioned column names in R
Create new data frame column based on conditioned column names in R

Time:01-10

I am working with a large dataset. This dataset has a column with a reference code and other columns whose values are TRUE or FALSE. I'm trying to create a new column that collects concatenated (or as a list) the names of the columns whose value is equal to TRUE. A simplification of the dataframe would be something like:

data <- data.frame( Reference = c("001", "002", "003", "004", "005"),
                 Column A = c(TRUE, TRUE, FALSE, TRUE, FALSE), 
                 Column B = c(FALSE, TRUE, TRUE, FALSE, FALSE), 
                 Column C = c(TRUE, FALSE, TRUE, FALSE, TRUE))
data

The expected result would be something like:

Reference Column A Column B Column C Column D
0001 TRUE FALSE TRUE Column A, Column C
0002 TRUE TRUE FALSE Column A, Column B
0003 FALSE TRUE TRUE Column B, Column C
0004 TRUE FALSE FALSE Column A
0005 FALSE FALSE TRUE Column C

I know how to create fields by concatenating values or column names but I don't know how to introduce the conditional so that it only takes the names of the columns whose value is TRUE.

Thank you so much in advance!!

CodePudding user response:

A tidyverse solution with tidyr::unite():

library(tidyverse)

data %>%
  mutate(unite(across(starts_with('Column'), ~ ifelse(.x, cur_column(), NA)),
               col = 'Column_D', sep = ', ', na.rm = TRUE))

#   Reference Column_A Column_B Column_C           Column_D
# 1       001     TRUE    FALSE     TRUE Column_A, Column_C
# 2       002     TRUE     TRUE    FALSE Column_A, Column_B
# 3       003    FALSE     TRUE     TRUE Column_B, Column_C
# 4       004     TRUE    FALSE    FALSE           Column_A
# 5       005    FALSE    FALSE     TRUE           Column_C

CodePudding user response:

Base R solution

data <- data.frame( Reference = c("001", "002", "003", "004", "005"),
                    Column_A = c(TRUE, TRUE, FALSE, TRUE, FALSE), 
                    Column_B = c(FALSE, TRUE, TRUE, FALSE, FALSE), 
                    Column_C = c(TRUE, FALSE, TRUE, FALSE, TRUE))

L <- apply(data, 1, function(i) which(i == TRUE))
data$Column_D <- lapply(L, function(x) paste0(names(x), collapse = ", "))

#   Reference Column_A Column_B Column_C           Column_D
# 1       001     TRUE    FALSE     TRUE Column_A, Column_C
# 2       002     TRUE     TRUE    FALSE Column_A, Column_B
# 3       003    FALSE     TRUE     TRUE Column_B, Column_C
# 4       004     TRUE    FALSE    FALSE           Column_A
# 5       005    FALSE    FALSE     TRUE           Column_C

CodePudding user response:

Another dplyr option:

data %>% 
  rowwise() %>% 
  mutate(Column_D = toString(names(across(starts_with("Column")))[which(c_across(starts_with("Column")))]))

  Reference Column_A Column_B Column_C Column_D          
  <chr>     <lgl>    <lgl>    <lgl>    <chr>             
1 001       TRUE     FALSE    TRUE     Column_A, Column_C
2 002       TRUE     TRUE     FALSE    Column_A, Column_B
3 003       FALSE    TRUE     TRUE     Column_B, Column_C
4 004       TRUE     FALSE    FALSE    Column_A          
5 005       FALSE    FALSE    TRUE     Column_C          
  • Related