Home > Mobile >  R: Equivalent of "rank" and "partition"
R: Equivalent of "rank" and "partition"

Time:12-16

I am working with the R programming language. I have the following SQL query:

select b.var1 as var1, b.var2 from
(select *, rank() over( partition by var1 order by var3) as rank1
from my_table)b; 

The goal of this SQL code is to:

  • Find groups of records containing duplicate values of the same "var1"
  • For each of these groups, sort these records based on their values of "var3"
  • For each of these groups of sorted duplicate records, only keep the the record with the largest value of "var3"
  • Note: Records containing non-duplicate values of "var1" are left untouched

My Question: Does anyone know if it is possible to run this same code in R? For example:

library(RODBC)
library(dbi)
library(odbc)
library(sqldf)
library(SQLite)

dbWriteTable(my_db_connection, SQL("   select b.var1 as var1, b.var2 from
    (select *, rank() over( partition by var1 order by var3) as rank1
    from my_table)b " ), results_of_this_query)

Does anyone know if it is possible to do this in R? Does R recognize SQL commands like "rank()", "over" and "partition"?

Thanks!

CodePudding user response:

Consider ave with rank after you sort by partition and order columns:

df <- with(df, df[order(var1, var3),])

df$rank1 <- with(df, ave(1:nrow(df), var1, FUN=rank))

Add lambda for other arguments:

df$rank1 <- with(df, ave(1:nrow(df), var2, FUN=function(x) rank(x, ties.method="first", na.last="keep")))

CodePudding user response:

The Tidyverse collection of packages specially the dplyr can easy do this task for you on a grammar likely SQL.

The final code should be something like

df %>%
group_by(var1) %>%
mutate(rank = dense_rank(var3))

I may not understand you SQL-string and made the correctly translated because SQL language is not my strong skill.

  • Related