Home > database >  new variables based on repeating two variables
new variables based on repeating two variables

Time:12-17

I need help with data, I have two variables: ids and activity, and I want to create a new variable flag. Both ids and activity can/cannot repeat. However, activity can take two possible values, a or b.

If for particular id, activity includes only a then flag=0

If for particular id, activity includes both "a" and "b" then all "a" should be flag as 1 and all b should be flag as 2

Note: activity "b" will not appear by itself.

DATA sample

ids <- c(1,1,1,2,4,4,4,7,7,11,13,13,13)
activity <- c("a","a","b","a","a","a","a","a","b","a","a","b","b")
df <- data.frame(ids, activity)

the expected outcome in df form below

ids <- c(1,1,1,2,4,4,4,7,7,11,13,13,13,17,17,19,19,19,19)
activity <- c("a","a","b","a","a","a","a","a","b","a","a","b","b","a","a","a","a","b","b")
flag<- c(1,1,2,0,0,0,0,1,2,0,1,2,2,0,0,1,1,2,2)
df <- data.frame(ids, activity, flag)

Also I am new to R, so any suggestions on which packages and functions I should learn more about for this kind of question will be helpful.

CodePudding user response:

You may use case_when and include different conditions in it.

library(dplyr)

df <- df %>%
  group_by(ids) %>%
  mutate(flag = case_when(all(activity == "a") ~ 0, 
                           activity == "a" ~ 1, 
                           activity == "b" ~ 2)) %>%
  ungroup

df

#     ids activity  flag
#   <dbl> <chr>    <dbl>
# 1     1 a            1
# 2     1 a            1
# 3     1 b            2
# 4     2 a            0
# 5     4 a            0
# 6     4 a            0
# 7     4 a            0
# 8     7 a            1
# 9     7 b            2
#10    11 a            0
#11    13 a            1
#12    13 b            2
#13    13 b            2
#14    17 a            0
#15    17 a            0
#16    19 a            1
#17    19 a            1
#18    19 b            2
#19    19 b            2

CodePudding user response:

Most things in R can be done without additional packages, which is often confusing for beginners. In base R, using within, you could first set all a to 1 and all b to 2, then use ave to identify number of occurrences by group and set the 1-flag to zero where the number of occurrences is one.

df <- within(df, {
  flag <- ifelse(activity == 'a', 1, 2)
  flag[flag == 1 & ave(activity, ids, FUN=\(x) length(unique(x))) == 1] <- 0
})
df
#    ids activity flag
# 1    1        a    1
# 2    1        a    1
# 3    1        b    2
# 4    2        a    0
# 5    4        a    0
# 6    4        a    0
# 7    4        a    0
# 8    7        a    1
# 9    7        b    2
# 10  11        a    0
# 11  13        a    1
# 12  13        b    2
# 13  13        b    2
# 14  17        a    0
# 15  17        a    0
# 16  19        a    1
# 17  19        a    1
# 18  19        b    2
# 19  19        b    2

Data

df <- structure(list(ids = c(1, 1, 1, 2, 4, 4, 4, 7, 7, 11, 13, 13, 
13, 17, 17, 19, 19, 19, 19), activity = c("a", "a", "b", "a", 
"a", "a", "a", "a", "b", "a", "a", "b", "b", "a", "a", "a", "a", 
"b", "b")), row.names = c(NA, -19L), class = "data.frame")
  •  Tags:  
  • r
  • Related