R: Equivalent of "rank" and "partition"-CodePudding

I am working with the R programming language. I have the following SQL query:

select b.var1 as var1, b.var2 from
(select *, rank() over( partition by var1 order by var3) as rank1
from my_table)b;

The goal of this SQL code is to:

Find groups of records containing duplicate values of the same "var1"
For each of these groups, sort these records based on their values of "var3"
For each of these groups of sorted duplicate records, only keep the the record with the largest value of "var3"
Note: Records containing non-duplicate values of "var1" are left untouched

My Question: Does anyone know if it is possible to run this same code in R? For example:

library(RODBC)
library(dbi)
library(odbc)
library(sqldf)
library(SQLite)

dbWriteTable(my_db_connection, SQL("   select b.var1 as var1, b.var2 from
    (select *, rank() over( partition by var1 order by var3) as rank1
    from my_table)b " ), results_of_this_query)

Does anyone know if it is possible to do this in R? Does R recognize SQL commands like "rank()", "over" and "partition"?

Thanks!

CodePudding user response：

Consider ave with rank after you sort by partition and order columns:

df <- with(df, df[order(var1, var3),])

df$rank1 <- with(df, ave(1:nrow(df), var1, FUN=rank))

Add lambda for other arguments:

df$rank1 <- with(df, ave(1:nrow(df), var2, FUN=function(x) rank(x, ties.method="first", na.last="keep")))

CodePudding user response：

The Tidyverse collection of packages specially the dplyr can easy do this task for you on a grammar likely SQL.

The final code should be something like

df %>%
group_by(var1) %>%
mutate(rank = dense_rank(var3))

I may not understand you SQL-string and made the correctly translated because SQL language is not my strong skill.