I need help with data, I have two variables: ids
and activity
, and I want to create a new variable flag
.
Both ids and activity can/cannot repeat. However, activity can take two possible values, a
or b
.
If for particular id, activity includes only a
then flag=0
If for particular id, activity includes both "a" and "b" then all "a" should be flag as 1 and all b should be flag as 2
Note: activity "b" will not appear by itself.
DATA sample
ids <- c(1,1,1,2,4,4,4,7,7,11,13,13,13)
activity <- c("a","a","b","a","a","a","a","a","b","a","a","b","b")
df <- data.frame(ids, activity)
the expected outcome in df form below
ids <- c(1,1,1,2,4,4,4,7,7,11,13,13,13,17,17,19,19,19,19)
activity <- c("a","a","b","a","a","a","a","a","b","a","a","b","b","a","a","a","a","b","b")
flag<- c(1,1,2,0,0,0,0,1,2,0,1,2,2,0,0,1,1,2,2)
df <- data.frame(ids, activity, flag)
Also I am new to R, so any suggestions on which packages and functions I should learn more about for this kind of question will be helpful.
CodePudding user response:
You may use case_when
and include different conditions in it.
library(dplyr)
df <- df %>%
group_by(ids) %>%
mutate(flag = case_when(all(activity == "a") ~ 0,
activity == "a" ~ 1,
activity == "b" ~ 2)) %>%
ungroup
df
# ids activity flag
# <dbl> <chr> <dbl>
# 1 1 a 1
# 2 1 a 1
# 3 1 b 2
# 4 2 a 0
# 5 4 a 0
# 6 4 a 0
# 7 4 a 0
# 8 7 a 1
# 9 7 b 2
#10 11 a 0
#11 13 a 1
#12 13 b 2
#13 13 b 2
#14 17 a 0
#15 17 a 0
#16 19 a 1
#17 19 a 1
#18 19 b 2
#19 19 b 2
CodePudding user response:
Most things in R can be done without additional packages, which is often confusing for beginners. In base R, using within
, you could first set all a
to 1
and all b
to 2
, then use ave
to identify number of occurrences by group and set the 1
-flag to zero where the number of occurrences is one.
df <- within(df, {
flag <- ifelse(activity == 'a', 1, 2)
flag[flag == 1 & ave(activity, ids, FUN=\(x) length(unique(x))) == 1] <- 0
})
df
# ids activity flag
# 1 1 a 1
# 2 1 a 1
# 3 1 b 2
# 4 2 a 0
# 5 4 a 0
# 6 4 a 0
# 7 4 a 0
# 8 7 a 1
# 9 7 b 2
# 10 11 a 0
# 11 13 a 1
# 12 13 b 2
# 13 13 b 2
# 14 17 a 0
# 15 17 a 0
# 16 19 a 1
# 17 19 a 1
# 18 19 b 2
# 19 19 b 2
Data
df <- structure(list(ids = c(1, 1, 1, 2, 4, 4, 4, 7, 7, 11, 13, 13,
13, 17, 17, 19, 19, 19, 19), activity = c("a", "a", "b", "a",
"a", "a", "a", "a", "b", "a", "a", "b", "b", "a", "a", "a", "a",
"b", "b")), row.names = c(NA, -19L), class = "data.frame")