Trying to find the best way to do this. I am trying to output a table that only contains records that fulfills a condition of 2022 counts > 2021 counts. The table below is what I currently have from the original df. Not sure where to go next from here.
year | name | count |
---|---|---|
2021 | Bob | 22 |
2021 | Joe | 3 |
2021 | Meg | 4 |
2022 | Bob | 30 |
2022 | Joe | 1 |
2022 | Meg | 10 |
Resulting Table:
year | name | count |
---|---|---|
2022 | Bob | 30 |
2022 | Meg | 10 |
The script that I have so far that got me to the top df:
df %>%
group_by(year = format(as.Date(date, format = "%Y-%m-%d"), "%Y"), name) %>%
summarise(count = n()) %>%
filter(year == c("2022", "2021") )
CodePudding user response:
We may arrange
the data by 'name', 'year', then grouped by 'name', filter
the groups having the second count greater than the first
and the 'year' is 2022
library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
filter(nth(count, 2) > first(count), year == 2022) %>%
ungroup
-output
# A tibble: 2 × 3
year name count
<int> <chr> <int>
1 2022 Bob 30
2 2022 Meg 10
data
df <- structure(list(year = c(2021L, 2021L, 2021L, 2022L, 2022L, 2022L
), name = c("Bob", "Joe", "Meg", "Bob", "Joe", "Meg"), count = c(22L,
3L, 4L, 30L, 1L, 10L)), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
Here is an alternative approach for your example:
library(dplyr)
df %>%
group_by(name) %>%
slice_max(count) %>%
filter(year == 2022)
year name count
<int> <chr> <int>
1 2022 Bob 30
2 2022 Meg 10