Home > Software design >  find record within a group further subsetted by certain values and mutate a flag
find record within a group further subsetted by certain values and mutate a flag

Time:11-08

I have an example df:

df <- data.frame(
  group = c("a", "a", "a", "a", "b", "b", "c", "c", "c", "c", "d", "d", "d", "e", "e", "e", "e"),
  col1 = c(-36,10,-5,1, 0, 5,10, 5, 20, 2, -1, 1, 2, -10, -5, -1, 10 )

)

   group col1
1      a  -36
2      a   10
3      a   -5
4      a    1
5      b    0
6      b    5
7      c   10
8      c    5
9      c   20
10     c    2
11     d   -1
12     d    1
13     d    2
14     e  -10
15     e   -5
16     e   -1
17     e   10

and I want to mutate a flag such that if grouped by 'group, if there is a value of 1 in col1, set that to "Y" and all other records in that group to blank. If there is a value of less than or equal to one, set the record with the highest col1 to "Y" and all other records in that group to blank (but col1 cannot be greater than 1 - these records must have blank flag). Otherwise, set every record to blank. Every record above value of 1 should have blank flag.

This is what I've tried:

df3 <- df %>% mutate(flag = case_when(
  any(col1 == 1) ~ ifelse(col1 == 1, "Y", ""),
  any(col1 < 1) & !any(col1 == 1) ~ ifelse(col1 < 1 & col1 == max(col1), "Y", ""),
  TRUE ~ ""))

This is the expected outcome:

   group col1 flag
1      a  -36     
2      a   10     
3      a   -5     
4      a    1    Y
5      b    0    Y
6      b    5     
7      c   10     
8      c    5     
9      c   20     
10     c    2     
11     d   -1     
12     d    1    Y
13     d    2     
14     e  -10     
15     e   -5     
16     e   -1    Y
17     e   10    

CodePudding user response:

We could use if/else condition after grouping by 'group'

library(dplyr)
df %>% 
  group_by(group) %>%
  mutate(flag = c("", "Y")[1  (if(1 %in% col1) col1 == 1 else 
    if(any(col1 < 1))  col1 == max(col1[col1 <= 1]) else FALSE)]) %>% 
 # compact version with a warning
 #  mutate(flag = c("", "Y")[1 (col1 == max(col1[col1 <=1], na.rm = TRUE))]) %>%
  ungroup

-output

# A tibble: 17 × 3
   group  col1 flag 
   <chr> <dbl> <chr>
 1 a       -36 ""   
 2 a        10 ""   
 3 a        -5 ""   
 4 a         1 "Y"  
 5 b         0 "Y"  
 6 b         5 ""   
 7 c        10 ""   
 8 c         5 ""   
 9 c        20 ""   
10 c         2 ""   
11 d        -1 ""   
12 d         1 "Y"  
13 d         2 ""   
14 e       -10 ""   
15 e        -5 ""   
16 e        -1 "Y"  
17 e        10 ""   

CodePudding user response:

Another option with case_when

library(tidyverse)

df |>
  group_by(group) |>
  mutate(flag = case_when(
           col1 > 1 ~ "",
           col1 == 1 ~ "Y",
           col1 == max(col1[col1 <= 1]) ~ "Y",
           TRUE ~ ""
         ))
#> # A tibble: 17 x 3
#> # Groups:   group [5]
#>    group  col1 flag 
#>    <chr> <dbl> <chr>
#>  1 a       -36 ""   
#>  2 a        10 ""   
#>  3 a        -5 ""   
#>  4 a         1 "Y"  
#>  5 b         0 "Y"  
#>  6 b         5 ""   
#>  7 c        10 ""   
#>  8 c         5 ""   
#>  9 c        20 ""   
#> 10 c         2 ""   
#> 11 d        -1 ""   
#> 12 d         1 "Y"  
#> 13 d         2 ""   
#> 14 e       -10 ""   
#> 15 e        -5 ""   
#> 16 e        -1 "Y"  
#> 17 e        10 ""
  • Related