Home > Software engineering >  grouped data new column if a column includes a specific condition
grouped data new column if a column includes a specific condition

Time:03-24

I have a new Problem with my groupedData: I now want to add a new column that always says yes for a group, when I the column Caffeinefactor has a "yes" in the this group and always say so, if the column Caffeinefactor has only "no" and not a single "yes" in it.

I have tried the following code, but do not get the wished results:

Anycaffeine <- Data2 %>%
  setDT(Data2) %>%
  dplyr::group_by(PATIENT.ID) %>%
  dplyr::mutate(Anycaffeine = ifelse(colSums(Caffeinefactor == "yes") > 0, "Yes", "No"))

My Data looks something like this:

DF = structure(list(PATIENT.ID = c(210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 220909L, 
220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 
220909L, 220909L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 303926L, 303926L, 303926L, 303926L
), PATIENT.TREATMENT.NUMBER = c(1L, 2L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 1L, 2L, 3L, 4L, 
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 11L, 12L, 13L, 17L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 
10L, 12L, 13L, 14L, 15L, 16L, 1L, 2L, 3L, 4L), Caffeinefactor = c("no", 
"no", "no", "no", "yes", "yes", "yes", "no", "yes", "yes", "yes", 
"yes", "yes", "no", "no", "yes", "yes", "yes", "yes", "yes", 
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no", 
"no", "no", "no", "no", "no", "no", "yes", "yes", "yes", "yes", 
"yes", "no", "no", "no", "no", "no", "no", "yes", "no", "yes", 
"yes", "yes", "yes", "yes", "yes", "yes", "no", "no", "no", "no"
)), row.names = c(NA, -60L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x7fe7f7002ee0>)

CodePudding user response:

There are some issues in the mutate code

  1. colSums - on a logical vector (Caffeinefactor == "yes") wouldn't work as colSums/rowSums require a dim attribute i.e. it works with data.frame/matrix/tibble etc.
  2. ifelse is not really needed as we require only a single value as output in logical expression i.e. the aim is to check if there is any value of 'yes'. So, "yes" %in% Caffeinefactor returns a TRUE/FALSE of length 1 and this is converted to a numeric index to select "No", "Yes" from a vector (c("No", "Yes"))
library(dplyr)
Data2 %>% 
  group_by(PATIENT.ID) %>% 
  mutate(Anycaffeine = c("No", "Yes")[1   "yes" %in% Caffeinefactor]) %>%
  ungroup
  •  Tags:  
  • r
  • Related