Home > Software design >  Grouping data based on a condition (containing a specific string)
Grouping data based on a condition (containing a specific string)

Time:11-09


I have a dataset similar to this:

Year | ID | Type
2000 1 O
2000 1 O
2000 1 O
2000 1 O
2000 1 R
2017 5 O
2017 5 O
2000 8 R
2000 8 O
2002 8 O

I want to create a code that groups the data by year and ID (I imagine it would use Dplyr) BUT it needs to have a condition: if in a given year there is any type R associated with an ID, then I want it to choose type R. If it has only O type, then the output must be O.

Example:
Year | ID | Type
2000 1 R
2017 5 O
2000 8 R
2002 8 O

Thank you all

CodePudding user response:

We can do an arrange on a logical vector (TRUE comes after FALSE in alphabetic order) and slice the first row after grouping

library(dplyr)
df1 %>% 
   arrange(Year, ID, Type == 'O') %>%
   group_by(Year, ID) %>%
   slice_head(n = 1) %>%
   ungroup

-output

# A tibble: 4 × 3
   Year    ID Type 
  <int> <int> <chr>
1  2000     1 R    
2  2000     5 O    
3  2000     8 R    
4  2002     8 O    

Or after the arrange use distinct which returns the first non-duplicated row

df1 %>%
    arrange(Year, ID, Type == 'O') %>%
    distinct(Year, ID, .keep_all = TRUE)

-output

 Year ID Type
1 2000  1    R
2 2000  5    O
3 2000  8    R
4 2002  8    O

data

df1 <- structure(list(Year = c(2000L, 2000L, 2000L, 2000L, 2000L, 2000L, 
2000L, 2000L, 2000L, 2002L), ID = c(1L, 1L, 1L, 1L, 1L, 5L, 5L, 
8L, 8L, 8L), Type = c("O", "O", "O", "O", "R", "O", "O", "R", 
"O", "O")), class = "data.frame", row.names = c(NA, -10L))
  • Related