I have a dataset similar to this:
Year | ID | Type
2000 1 O
2000 1 O
2000 1 O
2000 1 O
2000 1 R
2017 5 O
2017 5 O
2000 8 R
2000 8 O
2002 8 O
I want to create a code that groups the data by year and ID (I imagine it would use Dplyr) BUT it needs to have a condition: if in a given year there is any type R associated with an ID, then I want it to choose type R. If it has only O type, then the output must be O.
Example:
Year | ID | Type
2000 1 R
2017 5 O
2000 8 R
2002 8 O
Thank you all
CodePudding user response:
We can do an arrange
on a logical vector (TRUE
comes after FALSE
in alphabetic order) and slice
the first row after grouping
library(dplyr)
df1 %>%
arrange(Year, ID, Type == 'O') %>%
group_by(Year, ID) %>%
slice_head(n = 1) %>%
ungroup
-output
# A tibble: 4 × 3
Year ID Type
<int> <int> <chr>
1 2000 1 R
2 2000 5 O
3 2000 8 R
4 2002 8 O
Or after the arrange
use distinct
which returns the first non-duplicated row
df1 %>%
arrange(Year, ID, Type == 'O') %>%
distinct(Year, ID, .keep_all = TRUE)
-output
Year ID Type
1 2000 1 R
2 2000 5 O
3 2000 8 R
4 2002 8 O
data
df1 <- structure(list(Year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L,
2000L, 2000L, 2000L, 2002L), ID = c(1L, 1L, 1L, 1L, 1L, 5L, 5L,
8L, 8L, 8L), Type = c("O", "O", "O", "O", "R", "O", "O", "R",
"O", "O")), class = "data.frame", row.names = c(NA, -10L))