Home > Net >  Filtering in dplyr with multiple conditions
Filtering in dplyr with multiple conditions

Time:09-29

Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here.

year name count
2021 Bob 22
2021 Joe 3
2021 Meg 4
2022 Bob 30
2022 Joe 1
2022 Meg 10

Resulting Table:

year name count
2022 Bob 30
2022 Meg 10

The script that I have so far that got me to the top df:

df %>% 
  group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>%
  summarise(count = n()) %>%
  filter(year == c("2022", "2021") ) 

CodePudding user response:

We may arrange the data by 'name', 'year', then grouped by 'name', filter the groups having the second count greater than the first and the 'year' is 2022

library(dplyr)
df %>% 
  arrange(name, year) %>% 
  group_by(name) %>%
  filter(nth(count, 2) > first(count), year == 2022) %>%
  ungroup

-output

# A tibble: 2 × 3
   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10

data

df <- structure(list(year = c(2021L, 2021L, 2021L, 2022L, 2022L, 2022L
), name = c("Bob", "Joe", "Meg", "Bob", "Joe", "Meg"), count = c(22L, 
3L, 4L, 30L, 1L, 10L)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

Here is an alternative approach for your example:

library(dplyr)

df %>% 
  group_by(name) %>% 
  slice_max(count) %>% 
  filter(year == 2022)
   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10
  • Related