Home > Software design >  Identify and mark batch group where string is present anywhere in group
Identify and mark batch group where string is present anywhere in group

Time:01-10

In this type of data, where Subjects have recorded Annotations "x" or "m":

df <- data.frame(
  Subject = c(rep("Rater_1",9), rep("Rater_2",9), rep("Rater_3",9)),
  Trial = c(1:9,1:9,1:9),
  Annotation = c(rep("x",4),rep("m",5),
                 rep("x",2),rep("m",4), rep("x",3),
                 rep("x",1),rep("m",8)),
  batch = rep(c(0,0,0,1,1,1,2,2,2),3)
)

I'm looking to identify any batches in which Subjects have used "x". For each Subject and each batch I would like to record the value "xx" for the batch as a whole.

This only records "xx" where there is "x" in Annotation but it does not spread the new Annotation to the whole batch:

library(dplyr)
df %>%
  group_by(Subject, batch) %>%
  mutate(Annotation_0 = ifelse(if_any(Annotation, ~str_detect(., 'x')), "xx", Annotation))

How can this desired output be achieved?

# A tibble: 27 × 5
# Groups:   Subject, batch [9]
   Subject Trial Annotation batch Annotation_0
   <chr>   <int> <chr>      <dbl> <chr>       
 1 Rater_1     1 x              0 xx          
 2 Rater_1     2 x              0 xx          
 3 Rater_1     3 x              0 xx          
 4 Rater_1     4 x              1 xx          
 5 Rater_1     5 m              1 xx           
 6 Rater_1     6 m              1 xx           
 7 Rater_1     7 m              2 m           
 8 Rater_1     8 m              2 m           
 9 Rater_1     9 m              2 m           
10 Rater_2     1 x              0 xx          
11 Rater_2     2 x              0 xx          
12 Rater_2     3 m              0 xx           
13 Rater_2     4 m              1 m           
14 Rater_2     5 m              1 m           
15 Rater_2     6 m              1 m           
16 Rater_2     7 x              2 xx          
17 Rater_2     8 x              2 xx          
18 Rater_2     9 x              2 xx          
19 Rater_3     1 x              0 xx          
20 Rater_3     2 m              0 xx           
21 Rater_3     3 m              0 xx           
22 Rater_3     4 m              1 m           
23 Rater_3     5 m              1 m           
24 Rater_3     6 m              1 m           
25 Rater_3     7 m              2 m           
26 Rater_3     8 m              2 m           
27 Rater_3     9 m              2 m 

CodePudding user response:

You can do,

library(dplyr)

df %>% 
 group_by(Subject, batch) %>% 
 mutate(res = ifelse(any(Annotation == 'x'), 'xx', Annotation))
  • Related