Home > Back-end >  dplyr conditional filtering groups
dplyr conditional filtering groups

Time:09-13

I have a dataset which is kinda like this:

df <- tibble(group = c(1,1,1,2,2,3,3),
       type = c("HOME", "MAIL", "HOME", "HOME", "HOME",
                "MAIL", "MAIL"),
       count = c(3,4,4,3,3,2,2))

In my dplyr pipe, I would like to do a conditional filter, such that for each group if there is any type == "HOME", it will filter out from that group any TYPE == "MAIL", if there is no "HOME", then keep the "MAIL". As eventually I summarize the count, and would like the output of that to be group 1 = 7, group 2 = 6, group 3 = 4.

Any help would be appreciated

CodePudding user response:

We could convert to ordered factor and get the first level filtered

library(dplyr)
df %>% 
  mutate(type = ordered(type, c("HOME", "MAIL"))) %>% 
  group_by(group) %>% 
  summarise(count = sum(count[type %in% 
         first(levels(droplevels(type)))]))

-output

# A tibble: 3 × 2
  group count
  <dbl> <dbl>
1     1     7
2     2     6
3     3     4

Or could also use min

df %>% 
  group_by(group) %>% 
   summarise(count = sum(count[type ==min(ordered(type, c("HOME", "MAIL")))]))

CodePudding user response:

Another option is to use an if else to filter, so that if HOME is present in a group, then we remove MAIL, or else we keep MAIL if HOME is not in the group. Then, finally we can summarise.

library(tidyverse)

df %>%
  group_by(group) %>%
  filter(if (any(str_detect(type, 'HOME')))
    type != "MAIL"
    else
      type == "MAIL") %>%
  summarise(count = sum(count))

Output

  group count
  <dbl> <dbl>
1     1     7
2     2     6
3     3     4
  • Related