I want to create a YES/NO flag in a data set such that if a variable in VARIABLE
column is present in every STUDY
then YES, otherwise NO. so for example when AGE is present in all studies then the flag shoudl be YES otherwise it is no.
A simple example is here but my data is much bigger and has other columnes as well:
STUDY VARIABLE
STUDY1 AGE
STUDY1 SEX
STUDY1 WT
STUDY2 AGE
STUDY2 RACE
STUDY2 WT
STUDY3 AGE
STUDY3 SEX
STUDY3 WT
Output should be:
STUDY VARIABLE FLAG
STUDY1 AGE YES
STUDY1 SEX NO
STUDY1 WT YES
STUDY2 AGE YES
STUDY2 RACE NO
STUDY2 WT YES
STUDY3 AGE YES
STUDY3 SEX NO
STUDY3 WT YES
How can i do that in R using `data.table' ?
CodePudding user response:
You may try
df[, n := (.N == 3), by = VARIABLE][, FLAG := c("YES", "NO")[match(FLAG, c(TRUE, FALSE))]]
STUDY VARIABLE FLAG
1: STUDY1 AGE YES
2: STUDY1 SEX NO
3: STUDY1 WT YES
4: STUDY2 AGE YES
5: STUDY2 RACE NO
6: STUDY2 WT YES
7: STUDY3 AGE YES
8: STUDY3 SEX NO
9: STUDY3 WT YES
If you need generalized way, please let me know(with specific condition will be grateful).
df[, FLAG := (.N == df[,uniqueN(n = uniqueN(STUDY), STUDY)]), by = VARIABLE][, FLAG := c("YES", "NO")[match(FLAG, c(TRUE, FALSE))]]
CodePudding user response:
You can check for unique counts of STUDY
in each VARIABLE
.
library(data.table)
setDT(df) #If needed to convert to `data.table`
df[, FLAG := ifelse(uniqueN(STUDY) == uniqueN(df$STUDY), 'YES', 'NO'), VARIABLE]
df
# STUDY VARIABLE FLAG
#1: STUDY1 AGE YES
#2: STUDY1 SEX NO
#3: STUDY1 WT YES
#4: STUDY2 AGE YES
#5: STUDY2 RACE NO
#6: STUDY2 WT YES
#7: STUDY3 AGE YES
#8: STUDY3 SEX NO
#9: STUDY3 WT YES