I would like to use dplyr
so that when two rows have the same Label but different Type, only the one with type "big" is kept.
Current structure
df <- data.frame(Label = c("A", "A", "B", "C", "C", "D"), Type = c("big", "small", "big", "small", "tall", "short"))
Desired df
df_clean <- data.frame(Label = c("A", "B", "C", "D"), Type = c("big", "big", "tall", "short"))
The premise is that big > small
and tall > small
PD: my real dataframe has other categories but there are also hierarchies.
Thank you!
CodePudding user response:
Does this work:
library(dplyr)
df %>% group_by(Label) %>% filter(if(n() > 1) Type == 'big' else TRUE)
# A tibble: 3 x 2
# Groups: Label [3]
Label Type
<chr> <chr>
1 A big
2 B big
3 C small
CodePudding user response:
df1 <- data.frame(Label = c("A", "A", "B", "C"),
Type = c("big", "small", "big", "small"))
Relying on alphabetic ordering:
library(dplyr)
df1 %>%
group_by(Label) %>%
arrange(Type) %>%
summarise(Type = first(Type))
custom ordering:
library(dplyr)
df1 %>%
mutate(Type = factor(Type, levels = c( "big", "small"))) %>%
group_by(Label) %>%
arrange(Type) %>%
summarise(Type = first(Type))
Return:
Label Type
<chr> <chr>
1 A big
2 B big
3 C small