I have a data frame which always starts with a target column, then an unknown number of other columns, all of booleans (results of agrep searches against a dynamic number of search patterns).
I need to create a column called final_result, which is TRUE if any of the boolean columns have a TRUE value in them. The number of boolean columns is unknown in advance as the data frame is created on the fly.
My rather naive approach was this:
target = c('blood', 'pressure','lymphatic')
result_1 = c(TRUE, TRUE, FALSE)
result_2 = c(TRUE, FALSE, FALSE)
# may be many more columns, unknown at runtime
df = data.frame(target, result_1, result_2)
df$final_result <- any(df[,2:ncol(df)])
but this returns:
the last result "lymphatic" has both FALSE values, and so should return FALSE.
Any ideas appreciated.
CodePudding user response:
A possible solution, based on dplyr
:
library(dplyr)
df %>%
mutate(new = rowSums(across(-target)) > 0)
#> target result_1 result_2 new
#> 1 blood TRUE TRUE TRUE
#> 2 pressure TRUE FALSE TRUE
#> 3 lymphatic FALSE FALSE FALSE
CodePudding user response:
An approach that does not require any additional packages is:
df$final_result <- apply(df[,-1], 1, any)
The -1
means all of the columns except the first one. The apply
function will convert the rest of the data frame into a matrix, then apply the any
function to each row (the 2nd argument is 1 for rows, 2 for columns).
Another approach that does not convert to a matrix (so could be faster in some cases) is:
df$final_result <- Reduce(`|`, df[-1])
This treats the data frame as a list and starts by finding if the first column (after dropping "target") or the second is TRUE
, then finds if that result or the third column is TRUE
, then compare that result with the 4th column, on until it runs out of columns.
If you want to use the tidyverse, then pmap
from the purrr
package can do this:
library(tidyverse)
df$final_result <- df[-1] %>% pmap_lgl(any)
For any of these you can replace the -1
with 2:ncol(df)
or with the results of which
, grep
, sapply
etc. used to select columns.
CodePudding user response:
In dplyr
, we can use if_any
library(dplyr)
df %>%
mutate(final_result = if_any(starts_with('result')))
-output
target result_1 result_2 final_result
1 blood TRUE TRUE TRUE
2 pressure TRUE FALSE TRUE
3 lymphatic FALSE FALSE FALSE