Filtering in dplyr with multiple conditions-CodePudding

Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here.

year	name	count
2021	Bob	22
2021	Joe	3
2021	Meg	4
2022	Bob	30
2022	Joe	1
2022	Meg	10

Resulting Table:

year	name	count
2022	Bob	30
2022	Meg	10

The script that I have so far that got me to the top df:

df %>% 
  group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>%
  summarise(count = n()) %>%
  filter(year == c("2022", "2021") )

CodePudding user response：

We may arrange the data by 'name', 'year', then grouped by 'name', filter the groups having the second count greater than the first and the 'year' is 2022

library(dplyr)
df %>% 
  arrange(name, year) %>% 
  group_by(name) %>%
  filter(nth(count, 2) > first(count), year == 2022) %>%
  ungroup

-output

# A tibble: 2 × 3
   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10

data

df <- structure(list(year = c(2021L, 2021L, 2021L, 2022L, 2022L, 2022L
), name = c("Bob", "Joe", "Meg", "Bob", "Joe", "Meg"), count = c(22L, 
3L, 4L, 30L, 1L, 10L)), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response：

Here is an alternative approach for your example:

library(dplyr)

df %>% 
  group_by(name) %>% 
  slice_max(count) %>% 
  filter(year == 2022)

   year name  count
  <int> <chr> <int>
1  2022 Bob      30
2  2022 Meg      10