I have a dataset which is kinda like this:
df <- tibble(group = c(1,1,1,2,2,3,3),
type = c("HOME", "MAIL", "HOME", "HOME", "HOME",
"MAIL", "MAIL"),
count = c(3,4,4,3,3,2,2))
In my dplyr
pipe, I would like to do a conditional filter, such that for each group if there is any type == "HOME"
, it will filter out from that group any TYPE == "MAIL"
, if there is no "HOME", then keep the "MAIL". As eventually I summarize the count, and would like the output of that to be
group 1 = 7, group 2 = 6, group 3 = 4.
Any help would be appreciated
CodePudding user response:
We could convert to ordered
factor and get the first level
filtered
library(dplyr)
df %>%
mutate(type = ordered(type, c("HOME", "MAIL"))) %>%
group_by(group) %>%
summarise(count = sum(count[type %in%
first(levels(droplevels(type)))]))
-output
# A tibble: 3 × 2
group count
<dbl> <dbl>
1 1 7
2 2 6
3 3 4
Or could also use min
df %>%
group_by(group) %>%
summarise(count = sum(count[type ==min(ordered(type, c("HOME", "MAIL")))]))
CodePudding user response:
Another option is to use an if else
to filter
, so that if HOME
is present in a group, then we remove MAIL
, or else we keep MAIL
if HOME
is not in the group. Then, finally we can summarise
.
library(tidyverse)
df %>%
group_by(group) %>%
filter(if (any(str_detect(type, 'HOME')))
type != "MAIL"
else
type == "MAIL") %>%
summarise(count = sum(count))
Output
group count
<dbl> <dbl>
1 1 7
2 2 6
3 3 4