I need to apply a procedure in SQL that is easy for me since R, but has been really tortuous in SQL.
I need to sort the data from highest to lowest by two variables, group based on another variable, and select the first item in each group.
I leave the code that I am trying to pass from R to SQL. Unfortunately the dbplyr package throws me an error when trying to convert one language to another: Error: first()
is only available in a windowed (mutate()
) context
library(tidyverse)
library(dbplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")
mtcars2
mtcars2 %>%
arrange(-mpg,-disp) %>%
group_by(cyl) %>%
summarise(hp = first(hp)) %>%
show_query()
It seems to me that the DISTINCT ON function could help me.
Thanks for your help.
CodePudding user response:
Maybe the following?
library(tidyverse)
library(dbplyr)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")
mtcars2 %>%
arrange(-mpg,-disp) %>%
group_by(cyl) %>%
mutate(hp = first(hp)) %>%
select(cyl, hp) %>%
distinct %>%
show_query
#> <SQL>
#> SELECT DISTINCT `cyl`, FIRST_VALUE(`hp`) OVER (PARTITION BY `cyl` ORDER BY -`mpg`, -`disp`) AS `hp`
#> FROM `mtcars`
#> ORDER BY -`mpg`, -`disp`