I have some individuals that are listed twice because they have received numerous degrees. I am trying to only get the rows with the latest degree granting date. Below are examples of the current output and the desired output
people | g_date | wage|quarter
personA|2009-01-01|100 |20201
personA|2009-01-01|100 |20202
personA|2010-01-01|100 |20201
personA|2010-01-01|100 |20202
personB|2012-01-01|50 |20201
personB|2012-01-01|50 |20202
personB|2012-01-01|50 |20203
Desired output
people | g_date | wage|quarter
personA|2010-01-01|100 |20201
personA|2010-01-01|100 |20202
personB|2012-01-01|50 |20201
personB|2012-01-01|50 |20202
personB|2012-01-01|50 |20203
I have used the code that is below but it is removing all the rows so that there is only one row per person.
df<-df[order(df$g_date),]
df<-df[!duplicated(df$people, fromLast = TRUE),]
CodePudding user response:
Another option using group_by
with ordered slice_max
like this:
library(dplyr)
df %>%
group_by(people, quarter) %>%
slice_max(order_by = g_date, n = 1)
#> # A tibble: 5 × 4
#> # Groups: people, quarter [5]
#> people g_date wage quarter
#> <chr> <chr> <dbl> <int>
#> 1 personA 2010-01-01 100 20201
#> 2 personA 2010-01-01 100 20202
#> 3 personB 2012-01-01 50 20201
#> 4 personB 2012-01-01 50 20202
#> 5 personB 2012-01-01 50 20203
Created on 2022-12-15 with reprex v2.0.2
CodePudding user response:
merge(df, aggregate(. ~ people, df[1:2], max))
#> people g_date wage quarter
#> 1 personA 2010-01-01 100 20201
#> 2 personA 2010-01-01 100 20202
#> 3 personB 2012-01-01 50 20201
#> 4 personB 2012-01-01 50 20202
#> 5 personB 2012-01-01 50 20203
CodePudding user response:
Update (thanks to @Villalba, removed first answer):
We colud first group arrange and then filter:
library(dplyr)
library(lubridate)
df %>%
group_by(people, quarter) %>%
mutate(g_date = ymd(g_date)) %>%
arrange(g_date, .by_group = TRUE) %>%
filter(row_number()==n())
people g_date wage quarter
<chr> <date> <int> <int>
1 personA 2010-01-01 100 20201
2 personA 2010-01-01 100 20202
3 personB 2012-01-01 50 20201
4 personB 2012-01-01 50 20202
5 personB 2012-01-01 50 20203