Home > database >  Create new data frame boolean column based on dynamic number of other columns all being true
Create new data frame boolean column based on dynamic number of other columns all being true

Time:08-04

I have a data frame which always starts with a target column, then an unknown number of other columns, all of booleans (results of agrep searches against a dynamic number of search patterns).

I need to create a column called final_result, which is TRUE if any of the boolean columns have a TRUE value in them. The number of boolean columns is unknown in advance as the data frame is created on the fly.

My rather naive approach was this:

target = c('blood', 'pressure','lymphatic')
result_1 = c(TRUE, TRUE, FALSE)
result_2 = c(TRUE, FALSE, FALSE)
# may be many more columns, unknown at runtime

df = data.frame(target, result_1, result_2)
df$final_result <- any(df[,2:ncol(df)])

but this returns:

code results

the last result "lymphatic" has both FALSE values, and so should return FALSE.

Any ideas appreciated.

CodePudding user response:

A possible solution, based on dplyr:

library(dplyr)

df %>% 
  mutate(new = rowSums(across(-target)) > 0)

#>      target result_1 result_2   new
#> 1     blood     TRUE     TRUE  TRUE
#> 2  pressure     TRUE    FALSE  TRUE
#> 3 lymphatic    FALSE    FALSE FALSE

CodePudding user response:

An approach that does not require any additional packages is:

df$final_result <- apply(df[,-1], 1, any)

The -1 means all of the columns except the first one. The apply function will convert the rest of the data frame into a matrix, then apply the any function to each row (the 2nd argument is 1 for rows, 2 for columns).

Another approach that does not convert to a matrix (so could be faster in some cases) is:

df$final_result <- Reduce(`|`, df[-1])

This treats the data frame as a list and starts by finding if the first column (after dropping "target") or the second is TRUE, then finds if that result or the third column is TRUE, then compare that result with the 4th column, on until it runs out of columns.

If you want to use the tidyverse, then pmap from the purrr package can do this:

library(tidyverse)
df$final_result <- df[-1] %>% pmap_lgl(any) 

For any of these you can replace the -1 with 2:ncol(df) or with the results of which, grep, sapply etc. used to select columns.

CodePudding user response:

In dplyr, we can use if_any

library(dplyr)
df %>% 
   mutate(final_result = if_any(starts_with('result')))

-output

      target result_1 result_2 final_result
1     blood     TRUE     TRUE         TRUE
2  pressure     TRUE    FALSE         TRUE
3 lymphatic    FALSE    FALSE        FALSE
  • Related