I've tried different ways and searched for similar questions but no good luck.
I'd like to arrange and distinct my df in with a customized rule, where I keep only one row per group with the smallest val
. But when 1 is availabile in val
for example, I'd like to keep 1 instead of the smallest value.
val
is the value column and ID
is the ID column:
x = data.frame(ID=c("a", "a",
"b", "b",
"c", "c",
"d", "d"),
val=c(1, 2,
0.5, 2,
1, 0.5,
5, 20))
x
looks like:
ID val
1 a 1.0
2 a 2.0
3 b 0.5
4 b 2.0
5 c 1.0
6 c 0.5
7 d 5.0
8 d 20.0
I tried something like:
x %>% group_by(ID) %>% arrange(val) %>% distinct(ID, .keep_all = T) %>% arrange(ID)
and it gives me:
ID val
1 a 1
2 b 0.5
3 c 0.5
4 d 5
Tried slice_min
:
x %>%
group_by(ID) %>%
slice_min(order_by = tibble(val != 1, val), n = 1, with_ties = FALSE) %>%
ungroup()
and it gives me:
# A tibble: 3 × 2
ID val
<chr> <dbl>
1 a 1
2 c 1
3 d 5
Warning messages:
1: In xtfrm.data.frame(x) : cannot xtfrm data frames
2: In xtfrm.data.frame(x) : cannot xtfrm data frames
3: In xtfrm.data.frame(x) : cannot xtfrm data frames
4: In xtfrm.data.frame(x) : cannot xtfrm data frames
Desired output:
ID val
1 a 1
2 b 0.5
3 c 1
4 d 5
CodePudding user response:
You can arrange by val != 1
and val
and use slice_head()
on your grouped data.
x %>%
group_by(ID) %>%
arrange(val != 1, val) %>%
slice_head(n = 1) %>%
ungroup()
# A tibble: 4 × 2
ID val
<chr> <dbl>
1 a 1
2 b 0.5
3 c 1
4 d 5
Or using the development version of dplyr
you can use slice_min()
and make use of the order_by
argument which can take multiple variables via tibble()
:
x %>%
group_by(ID) %>%
slice_min(order_by = tibble(val != 1, val), n = 1, with_ties = FALSE) %>%
ungroup()