Home > Blockchain >  R set binary flag based on contents of other columns
R set binary flag based on contents of other columns

Time:07-26

I have tried the dplyr solution in R Set Column Value based on other Column Values, but when I run it I get the columns all repeated as THING$THING1 etc when I need just one column that flags as 1, when any of the columns THING1:THING4 contain a 1, and 0 if none of them do. (The data is using 1 for yes as the answer to a series of related questions, and 0 for no.)

Thing1 Thing2 Thing3 Thing4
0 0 0 0
1 0 0 0
0 1 0 1
0 0 0 0
0 0 1 0
1 0 0 0
0 0 0 0

And I want the to get the column:

Thing
0
1
1
0
1
1
0

The code as I'm using it is:

Things <- dataset %>%
  select(c(THING1:THING4)) %>%
  mutate(THING = across(.cols = THING1:THING4,
                .fns = ~ if_else(.x == 1|is.na(.x),
                                 1,
                                 0)))

I am using a vector as the real data has about a dozen columns to check.

CodePudding user response:

Here is one potential solution:

library(dplyr)

df <- read.table(text = "Thing1 Thing2  Thing3  Thing4
0   0   0   0
1   0   0   0
0   1   0   1
0   0   0   0
0   0   1   0
1   0   0   0
0   0   0   0", header = TRUE)

df %>%
  mutate(flag = as.numeric(if_any(starts_with("Thing"), ~.x == 1)))
#>   Thing1 Thing2 Thing3 Thing4 flag
#> 1      0      0      0      0    0
#> 2      1      0      0      0    1
#> 3      0      1      0      1    1
#> 4      0      0      0      0    0
#> 5      0      0      1      0    1
#> 6      1      0      0      0    1
#> 7      0      0      0      0    0

Created on 2022-07-26 by the reprex package (v2.0.1)

Edit

In the code in your question I see you can have NAs too. If you want to 'ignore' NA's you could use:

df %>%
  mutate(flag = as.numeric(if_any(starts_with("Thing"), ~.x == 1 & !is.na(.x))))
#>   Thing1 Thing2 Thing3 Thing4 flag
#> 1      0      0      0      0    0
#> 2      1      0      0      0    1
#> 3      0      1      0      1    1
#> 4      0      0      0     NA    0
#> 5      0      0      1     NA    1
#> 6      1      0      0      0    1
#> 7      0      0      0      0    0

CodePudding user response:

Your code is making the same operation on several columns, instead of summarizing several columns. If you want to use across (rather than if_any) you could make use of rowSums.

df |>
  mutate(Thing = as.numeric(rowSums(across(Thing1:Thing4), na.rm = TRUE) >= 1))

If you want to adapt your own implementation, you could use rowwise() (and any()):

df |>
  #select(c(Thing1:Thing4)) |>
  rowwise() |>
  mutate(Thing = if_else(any(c_across(Thing1:Thing4) == 1) | any(is.na(c_across(Thing1:Thing4))), 1, 0)) |>
  ungroup()

Data from @jared_mamrot

  • Related