- I want to count the rows in which the first column is 1 and rest other columns are 0. Because I want to see how many subscribers just watched the first episode and not the rest of the episodes.
- Same way just the last episode.
- All the episodes
- Rows where first and last episodes are 0 but any of the episodes in between can be 0 or 1 (at least one in between has to be 1 and it can be multiple ones). In other words, someone who watched just the episodes in between but not the first or last.
I have tried to use the dplyr
function in R using mutate
and ifelse
combined but it is not working.
The first column is the subscriber id and the other columns are episode air date (in order). 1
represents that the subscriber watched that episode and 0
subscriber didn't watch the episode.
CodePudding user response:
You can use the helper function dplyr::if_all()
inside dplyr::mutate()
like this:
library(dplyr)
# A simpler version of the dataset to illustrate
data <- tibble(
"2021-07-07" = sample(c(1, 0), 1000, replace = TRUE),
"2021-07-08" = sample(c(1, 0), 1000, replace = TRUE),
"2021-07-09" = sample(c(1, 0), 1000, replace = TRUE),
)
data %>%
mutate(
only_first = `2021-07-07` == 1 & if_all(-`2021-07-07`, ~ .x == 0)
)
Output:
# A tibble: 1,000 x 4
`2021-07-07` `2021-07-08` `2021-07-09` only_first
<dbl> <dbl> <dbl> <lgl>
1 0 1 1 FALSE
2 0 0 0 FALSE
3 0 1 0 FALSE
4 1 1 1 FALSE
5 1 0 1 FALSE
6 0 1 1 FALSE
7 1 1 0 FALSE
8 1 0 1 FALSE
9 1 1 0 FALSE
10 1 0 0 TRUE
# ... with 990 more rows
Breakdown of the logic
2021-07-07 == 1
matches rows where the first column equals 1if_all(-2021-07-07, ~ .x == 0)
matches rows where all columns except the first equals 0.- The
&
gives youTRUE
if both statements described over areTRUE
, andFALSE
if not.