Filter column based on other character column (in R)-CodePudding

I would like to use dplyr so that when two rows have the same Label but different Type, only the one with type "big" is kept.

Current structure

df <- data.frame(Label = c("A", "A", "B", "C", "C", "D"), Type = c("big", "small", "big", "small", "tall", "short"))

Desired df

df_clean <- data.frame(Label = c("A", "B", "C", "D"), Type = c("big", "big", "tall", "short"))

The premise is that big > small and tall > small

PD: my real dataframe has other categories but there are also hierarchies.

Thank you!

CodePudding user response：

Does this work:

library(dplyr)

df %>% group_by(Label) %>% filter(if(n() > 1) Type == 'big' else TRUE)
# A tibble: 3 x 2
# Groups:   Label [3]
  Label Type 
  <chr> <chr>
1 A     big  
2 B     big  
3 C     small

CodePudding user response：

df1 <- data.frame(Label = c("A", "A", "B", "C"),
                  Type = c("big", "small", "big", "small"))

Relying on alphabetic ordering:

library(dplyr)
df1 %>% 
  group_by(Label) %>% 
  arrange(Type) %>% 
  summarise(Type = first(Type))

custom ordering:

library(dplyr)
df1 %>% 
  mutate(Type = factor(Type, levels = c( "big", "small"))) %>% 
  group_by(Label) %>% 
  arrange(Type) %>% 
  summarise(Type = first(Type))

Return:

  Label Type 
  <chr> <chr>
1 A     big  
2 B     big  
3 C     small