Home > Software engineering >  In R, conditionally mutate a new column for all of an ID's rows
In R, conditionally mutate a new column for all of an ID's rows

Time:05-30

Background

I've got this dataframe df:

df <- data.frame(ID =    c("a","a","a","b", "c","c","c","c"),
                 event = c("red","black","blue","white", "orange","red","gray","green"),
                 stringsAsFactors=FALSE)

It's got some people in it (ID) and a description of an event. I'd like to make a new variable condition that indicates 1 or 0 based on whether any of the cells for a given ID contain either "red" or "blue".

The Problem

I can get this work, but only for the matching row. What I'd like is that if any of a person's cells in event contain "red" or "blue", all their cells in condition should be marked 1. In other words, I'd like this:

ID  event condition
 a    red         1
 a  black         1
 a   blue         1
 b  white         0
 c orange         1
 c    red         1
 c   gray         1
 c  green         1

What I've tried

So far, I've used this code to get this result:

df <- df %>%
mutate(condition = ifelse(df$event %in% c("red","blue"), 1, 0))

ID  event condition
 a    red         1
 a  black         0
 a   blue         1
 b  white         0
 c orange         0
 c    red         1
 c   gray         0
 c  green         0

In other words, the rows that match are marked 1, but I'd like all rows for an ID with any matching row to be marked 1.

CodePudding user response:

We need any wrapped around the logical vector from %in%- in addition the arguments can be reversed (In the OPs code, it is return 1 where it matches the elements 'red' or 'blue', leaving the others 0.

library(dplyr)
df %>% 
   group_by(ID) %>% 
   mutate(condition =  (any(c('red', 'blue') %in% event))) %>%
   ungroup

-output

# A tibble: 8 × 3
  ID    event  condition
  <chr> <chr>      <int>
1 a     red            1
2 a     black          1
3 a     blue           1
4 b     white          0
5 c     orange         1
6 c     red            1
7 c     gray           1
8 c     green          1

CodePudding user response:

Here is an alternative approach:

library(dplyr)
library(stringr)

df %>% 
  group_by(ID) %>% 
  mutate(condition = if_else(str_detect(event, paste(c("red", "blue"), collapse = "|")), 1, 0))
  ID    event  condition
  <chr> <chr>      <dbl>
1 a     red            1
2 a     black          0
3 a     blue           1
4 b     white          0
5 c     orange         0
6 c     red            1
7 c     gray           0
8 c     green          0
  • Related