Home > other >  Count the row if the first column has 1 and the rest columns are all zero
Count the row if the first column has 1 and the rest columns are all zero

Time:12-31

  • I want to count the rows in which the first column is 1 and rest other columns are 0. Because I want to see how many subscribers just watched the first episode and not the rest of the episodes.
  • Same way just the last episode.
  • All the episodes
  • Rows where first and last episodes are 0 but any of the episodes in between can be 0 or 1 (at least one in between has to be 1 and it can be multiple ones). In other words, someone who watched just the episodes in between but not the first or last.

I have tried to use the dplyr function in R using mutate and ifelse combined but it is not working.

enter image description here

The first column is the subscriber id and the other columns are episode air date (in order). 1 represents that the subscriber watched that episode and 0 subscriber didn't watch the episode.

CodePudding user response:

You can use the helper function dplyr::if_all() inside dplyr::mutate() like this:

library(dplyr)

# A simpler version of the dataset to illustrate
data <- tibble(
  "2021-07-07" = sample(c(1, 0), 1000, replace = TRUE),
  "2021-07-08" = sample(c(1, 0), 1000, replace = TRUE),
  "2021-07-09" = sample(c(1, 0), 1000, replace = TRUE),
)


data %>% 
  mutate(
    only_first = `2021-07-07` == 1 & if_all(-`2021-07-07`, ~ .x == 0)
  )

Output:

# A tibble: 1,000 x 4
   `2021-07-07` `2021-07-08` `2021-07-09` only_first
          <dbl>        <dbl>        <dbl> <lgl>     
 1            0            1            1 FALSE     
 2            0            0            0 FALSE     
 3            0            1            0 FALSE     
 4            1            1            1 FALSE     
 5            1            0            1 FALSE     
 6            0            1            1 FALSE     
 7            1            1            0 FALSE     
 8            1            0            1 FALSE     
 9            1            1            0 FALSE     
10            1            0            0 TRUE      
# ... with 990 more rows

Breakdown of the logic

  • 2021-07-07 == 1 matches rows where the first column equals 1
  • if_all(-2021-07-07, ~ .x == 0) matches rows where all columns except the first equals 0.
  • The & gives you TRUE if both statements described over are TRUE, and FALSE if not.
  • Related