I am using R studio, mostly dplyr processing, where I have a df of users (A,B,C,...) and what day since their first visit they were active (1,2,3,...)
user | day | active |
---|---|---|
A | 1 | T |
A | 3 | T |
B | 2 | T |
B | 4 | T |
I would like to complete this list with all missing days - up to their current maximum value (so for user B until 4 and for user A until 3) - and value FALSE:
user | day | active |
---|---|---|
A | 1 | T |
A | 2 | F |
A | 3 | T |
B | 1 | F |
B | 2 | T |
B | 3 | F |
B | 4 | T |
I've been googling and chewing on this for hours now. Anybody have an idea?
CodePudding user response:
We could group by 'user' and then get the sequence from min
to max
of 'day' in complete
to expand the data while fill
ing the 'active' column with FALSE
(by default missing combinations are filled with NA
)
library(dplyr)
library(tidyr)
df1 %>%
group_by(user) %>%
complete(day = min(day):max(day), fill = list(active = FALSE)) %>%
ungroup
-output
# A tibble: 6 × 3
user day active
<chr> <int> <lgl>
1 A 1 TRUE
2 A 2 FALSE
3 A 3 TRUE
4 B 2 TRUE
5 B 3 FALSE
6 B 4 TRUE
data
df1 <- structure(list(user = c("A", "A", "B", "B"), day = c(1L, 3L,
2L, 4L), active = c(TRUE, TRUE, TRUE, TRUE)), class = "data.frame",
row.names = c(NA,
-4L))
CodePudding user response:
You can create a new dataframe of users and days for all users and all days and then join it to your existing dataframe then set the active column. Something like this:
fullDf <- data.frame("user" = c(rep("A", 4), rep("B", 4)),
"day" = rep(1:4, 2))
existingDf <- left_join(fullDf, existingDf, by = c("user", "day"))
existingDf$active <- ifelse(is.na(existingDf$active), FALSE, existingDf$active