I am working with the R programming language. I have the following SQL query:
select b.var1 as var1, b.var2 from
(select *, rank() over( partition by var1 order by var3) as rank1
from my_table)b;
The goal of this SQL code is to:
- Find groups of records containing duplicate values of the same "var1"
- For each of these groups, sort these records based on their values of "var3"
- For each of these groups of sorted duplicate records, only keep the the record with the largest value of "var3"
- Note: Records containing non-duplicate values of "var1" are left untouched
My Question: Does anyone know if it is possible to run this same code in R? For example:
library(RODBC)
library(dbi)
library(odbc)
library(sqldf)
library(SQLite)
dbWriteTable(my_db_connection, SQL(" select b.var1 as var1, b.var2 from
(select *, rank() over( partition by var1 order by var3) as rank1
from my_table)b " ), results_of_this_query)
Does anyone know if it is possible to do this in R? Does R recognize SQL commands like "rank()", "over" and "partition"?
Thanks!
CodePudding user response:
Consider ave
with rank
after you sort by partition and order columns:
df <- with(df, df[order(var1, var3),])
df$rank1 <- with(df, ave(1:nrow(df), var1, FUN=rank))
Add lambda for other arguments:
df$rank1 <- with(df, ave(1:nrow(df), var2, FUN=function(x) rank(x, ties.method="first", na.last="keep")))
CodePudding user response:
The Tidyverse collection of packages specially the dplyr can easy do this task for you on a grammar likely SQL.
The final code should be something like
df %>%
group_by(var1) %>%
mutate(rank = dense_rank(var3))
I may not understand you SQL-string and made the correctly translated because SQL language is not my strong skill.