Home > Software engineering >  How can I rewrite a dplyr::top_n() call with weight using a dplyr::slice_* function
How can I rewrite a dplyr::top_n() call with weight using a dplyr::slice_* function

Time:01-10

I would like to replace the superseded top_n() call in the code below with the recommended slice_max() function but I don't see how to request weighting with slice_max().

top10 <- 
  structure(
    list(
      Variable = c("tfidf_text_crossing", "tfidf_text_best", 
                   "tfidf_text_amazing", "tfidf_text_fantastic",
                   "tfidf_text_player", "tfidf_text_great",
                   "tfidf_text_10", "tfidf_text_progress", 
                   "tfidf_text_relaxing", "tfidf_text_fix"), 
      Importance = c(0.428820580430941, 0.412741988094224,
                     0.368676982306671, 0.361409225854695, 
                     0.331176924533776, 0.307393456208119,
                     0.293945850296236, 0.286313554816565, 
                     0.283457020779205, 0.27899280757397), 
      Sign = c(tfidf_text_crossing = "POS", tfidf_text_best = "POS", 
               tfidf_text_amazing = "POS", tfidf_text_fantastic = "POS", 
               tfidf_text_player = "NEG", tfidf_text_great = "POS", 
               tfidf_text_10 = "POS", tfidf_text_progress = "NEG", 
               tfidf_text_relaxing = "POS", tfidf_text_fix = "NEG")
    ), 
    row.names = c(NA, -10L), 
    class = c("vi", "tbl_df", "tbl", "data.frame"), 
    type = "|coefficient|"
  )

suppressPackageStartupMessages(library(dplyr))

top10 |> 
  group_by(Sign) |> 
  top_n(2, wt = abs(Importance))
#> # A tibble: 4 × 3
#> # Groups:   Sign [2]
#>   Variable            Importance Sign 
#>   <chr>                    <dbl> <chr>
#> 1 tfidf_text_crossing      0.429 POS  
#> 2 tfidf_text_best          0.413 POS  
#> 3 tfidf_text_player        0.331 NEG  
#> 4 tfidf_text_progress      0.286 NEG

Created on 2023-01-06 with reprex v2.0.2

I think I will get the correct answers with:

top10 |> 
  group_by(Sign) |> 
  arrange(desc(abs(Importance))) |> 
  slice_head(n = 2)

but that is far less readable for the novices that I am teaching. Is there an obvious way to do this with a slice_* functions?

CodePudding user response:

You can handle the arrangeing of data with order_by=, which should make it more readable (and it does mimic your top_n code).

top10 |>
  group_by(Sign) |>
  slice_max(n = 2, order_by = abs(Importance))
# # A tibble: 4 × 3
# # Groups:   Sign [2]
#   Variable            Importance Sign 
#   <chr>                    <dbl> <chr>
# 1 tfidf_text_player        0.331 NEG  
# 2 tfidf_text_progress      0.286 NEG  
# 3 tfidf_text_crossing      0.429 POS  
# 4 tfidf_text_best          0.413 POS  
  • Related