Home > Net >  How to a row in a dataframe based on certain conditions
How to a row in a dataframe based on certain conditions

Time:08-11

I have some data that looks like this:

id ethnicity
1 white
2 south asian
2 other
3 other
4 white
4 south asian

as seen above there is potential for an id to have two ethnicity values. How would I go about removing these 'other' rows if that id already has an entry such as "white" or "south asian" while keeping the "white" or "south asian" entry?

I have noticed there are entries which also have south asian along with a white entry

My priority would be South Asian > White > Other in terms of keeping rows

So an expected output would be

id ethnicity
1 white
2 south asian
3 other
4 south asian

CodePudding user response:

If the intention is to get the prioritized 'ethnicity' per 'id', convert the column 'ethnicity' to ordered with levels specified in the order of preference, then do a group by 'id' and filter the first available level in that order

library(dplyr)
df2 %>%
  mutate(ethnicity = ordered(ethnicity, 
    c( "south asian", "white", "other"))) %>%
  group_by(id) %>% 
  filter(ethnicity %in% first(levels(droplevels(ethnicity)))) %>%
  ungroup

-output

# A tibble: 4 × 2
     id ethnicity  
  <int> <ord>      
1     1 white      
2     2 south asian
3     3 other      
4     4 south asian

data

df2 <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 4L), ethnicity = c("white", 
"south asian", "other", "other", "white", "south asian")),
 class = "data.frame", row.names = c(NA, 
-6L))
  •  Tags:  
  • r
  • Related