Home > OS >  Slice_min and Slice_max Tie Clarification
Slice_min and Slice_max Tie Clarification

Time:10-09

When does dplyr return ties when using slice_min and slice_max? I'm seeing some inconsistencies and can't seem to find any clarification online or in their documentation.

Examples:

library(dplyr)

#there is a tie but only returns 5 rows, not the bottom 5 mpg's
mtcars %>% slice_min(mpg, n = 5, with_ties = TRUE)
#>                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
#> Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
#> Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
#> Chrysler Imperial   14.7   8  440 230 3.23 5.345 17.42  0  0    3    4

#this will return the top two as a tie when above it did not
mtcars %>%
  slice_min(mpg, n = 1, with_ties = TRUE)
#>                      mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
#> Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4

#another example of it using ties to return more than 3 rows
starwars %>%
  select(gender, mass) %>%
  group_by(gender) %>%
  slice_min(mass, n = 3, with_ties = TRUE)
# A tibble: 8 x 2
# Groups:   gender [3]
#  gender     mass
#       
#1 feminine     45
#2 feminine     49
#3 feminine     50
#4 feminine     50
#5 masculine    15
#6 masculine    17
#7 masculine    20
#8 NA           48

Am I missing something here?

CodePudding user response:

The "tie" refers to the borderline entry, not any ties at all. So if the last element included is tied with an element that would be excluded otherwise, "with_ties" pulls it into the output.

my_data <- data.frame(a = c(1, 1, 2, 2))

> slice_min(my_data, a, n = 1)
  a
1 1
2 1
> slice_min(my_data, a, n = 2)
  a
1 1
2 1
> slice_min(my_data, a, n = 3)
  a
1 1
2 1
3 2
4 2

If you want the three lowest mpgs, you could start with a list of distinct mpgs, slice those, and join to original data:

mtcars %>%
  distinct(mpg) %>%
  slice_min(mpg, n = 3) %>%
  left_join(mtcars)

Joining, by = "mpg"
   mpg cyl disp  hp drat    wt  qsec vs am gear carb
1 10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
2 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
3 13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
4 14.3   8  360 245 3.21 3.570 15.84  0  0    3    4

CodePudding user response:

From the documentation of slice_min/slice_max

It says that:

with_ties

Should ties be kept together? The default, TRUE, may return more rows than you request. Use FALSE to ignore ties, and return the first n rows.

This means that in cases the number of minimal values you ask for is smaller than the actual number of entries with this minimal value, you will get a larger output than you expected.

CodePudding user response:

There can be some issues with slice_min/slice_max when there is only a single value in the data. It also means that suppose the number of rows is 10000, it will return all the rows whether it is tied or not

dat <- tibble(a = rep(1, 5))
> slice_min(dat, a, n = 1)
# A tibble: 5 × 1
      a
  <dbl>
1     1
2     1
3     1
4     1
5     1

> slice_min(dat, a, n = 1, with_ties = TRUE)
# A tibble: 5 × 1
      a
  <dbl>
1     1
2     1
3     1
4     1
5     1

If there are duplicate values and option is to arrange and use slice

mtcars %>%
    arrange(desc(mpg)) %>%
    slice(1)

We may get the output in a single filter as well

mtcars %>% filter(mpg %in% tail(unique(sort(mpg)), 3))
                mpg cyl disp  hp drat    wt  qsec vs am gear carb
Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2
> mtcars %>% filter(mpg %in% head(unique(sort(mpg)), 3))
                     mpg cyl disp  hp drat    wt  qsec vs am gear carb
Duster 360          14.3   8  360 245 3.21 3.570 15.84  0  0    3    4
Cadillac Fleetwood  10.4   8  472 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8  460 215 3.00 5.424 17.82  0  0    3    4
Camaro Z28          13.3   8  350 245 3.73 3.840 15.41  0  0    3    4
  • Related