Home > Net >  create variable with case_when/if_any across multiple variables
create variable with case_when/if_any across multiple variables

Time:01-30

I have a dataset with names from some variables corresponding to each month (e.g. var1_M1, var1_M2 var2_M1 ...) and I need to summarize by each month (after conditional) and identify the first case when I find a value > 90 and finally indicate the corresponding month.

library(dplyr)

ID <- c("Dave", "Joe", "Steve")
var1_M1 <- c(10, 10, 90)
var1_M2 <- c(30, 90, 95)
var1_M3 <- c(90, 100, 95)
var2_M1 <- c(10, 90, 20)
var2_M2 <- c(33, 10, 100)
var2_M3 <- c(90, 10, 50)

data <- tibble(ID, var1_M1, var1_M2, var1_M3, var2_M1, var2_M2, var2_M3)

# A tibble: 3 x 7
  ID    var1_M1 var1_M2 var1_M3 var2_M1 var2_M2 var2_M3
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1 Dave       10      30      90      10      33      90
2 Joe        10      90     100      90      10      10
3 Steve      90      95      95      20     100      50

data %>% 
  mutate_at(vars(matches(c("M1","M2","M3"))),
            list(~ ifelse(. > 90, .,0))) #%>% 
  #mutate() using map_dfc and if_any ??

Expected output

# A tibble: 3 x 5
  ID       M1    M2    M3 output
  <chr> <dbl> <dbl> <dbl>  <dbl>
1 Dave      0     0     0      0
2 Joe       0     0   100      3
3 Steve     0   195    95      2

CodePudding user response:

We may reshape to 'long' format with pivot_longer, grouped by 'ID', get the sum across the columns, where the value is greater than 90, and create the output of the first column index value not 0 with max.col

library(tidyr)
library(dplyr)
pivot_longer(data, cols = -ID, names_to = c(".value"), 
   names_pattern = ".*_(.*)") %>%
   group_by(ID) %>% 
   summarise(across(everything(), ~ sum(.x[.x > 90]))) %>% 
   mutate(output = max.col(across(starts_with("M")) >0, "first") * 
     if_any(starts_with("M"), ~ .x > 0))

-output

# A tibble: 3 × 5
  ID       M1    M2    M3 output
  <chr> <dbl> <dbl> <dbl>  <int>
1 Dave      0     0     0      0
2 Joe       0     0   100      3
3 Steve     0   195    95      2

With pivot_longer, we specified a regex in names_pattern to capture the characters after the _ as a group ((...)) in column names i.e. M1, M2, M3) using .* - to match zero or more characters, and specify the names_to as .value so that it reshape to long format with the column names as M1, M2, M3 combining all the var_ columns.

After we summarise with the sum, use max.col to find the first index of column values greater than 0. In the first row, i.e. for Dave id, all the values are 0, thus max.col returns the index 1. In order to make it 0, multiply with TRUE/FALSE output from if_any (TRUE -> 1, FALSE -> 0). i.e. it may be better to create a column to understand it better

pivot_longer(data, cols = -ID, names_to = c(".value"), 
   names_pattern = ".*_(.*)") %>%
   group_by(ID) %>% 
   summarise(across(everything(), ~ sum(.x[.x > 90]))) %>%
   mutate(if_any_ind = if_any(starts_with("M"), ~ .x > 0),
           output_ind =max.col(across(starts_with("M")) >0, "first"),
           output = if_any_ind * output_ind)

-output

# A tibble: 3 × 7
  ID       M1    M2    M3 if_any_ind output_ind output
  <chr> <dbl> <dbl> <dbl> <lgl>           <int>  <int>
1 Dave      0     0     0 FALSE               1      0
2 Joe       0     0   100 TRUE                3      3
3 Steve     0   195    95 TRUE                2      2

ie. any value x 0 -> 0 and value x 1 -> 1

  • Related