Home > Software design >  R: SQL query for mean extraction
R: SQL query for mean extraction

Time:04-24

I'd like to aggregate CD mean by CD_TALHAO, ID_UNIQUE and DATA_S2 using a SQL query with glue package. But when I try:


library(dplyr)
library(rgdal)
library(rgeos)
library(DBI)
library(glue)

# get AOI
download.file(
  "https://github.com/Leprechault/trash/raw/main/stands_example.zip",
  zip_path <- tempfile(fileext = ".zip")
)
unzip(zip_path, exdir = tempdir())

# Open the files 
setwd(tempdir())
stands_ds <- read.csv("pred_target_stands.csv", sep=";") # Data set
stands_ds <- stands_ds %>%
  mutate(DATA_S2 = ymd(DATA_S2))
stands_ds$CLASS<-c(rep("A",129),rep("B",130)) 
stands_ds$CD<-abs(rnorm(length(stands_ds[,1]),mean=50))

# Crete like a SQL server condition
bq_conn<- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(bq_conn, "stands_ds", stands_ds)

# Aggregate CD mean by CD_TALHAO, ID_UNIQUE and DATA_S2
sqlInput_pred_FARM <- glue::glue_sql("SELECT * FROM stands_ds AVG(CD) GROUP BY CD_TALHAO = {x} AND ID_UNIQUE = {y} AND DATA_S2 = {z}",
 x = "001G", y = "CERROCOROADO_001G", 
 z = "2021-04-02",.con=bq_conn)
pred_attack_BQ_FARM  <- dbGetQuery(bq_conn, as.character(sqlInput_pred_FARM, stringsAsFactors = T))


I always have Error: near "(": syntax error as output. Please, any help with it?

CodePudding user response:

Try the following:

statement <- "SELECT * FROM stands_ds AVG(CD) 
              GROUP BY CD_TALHAO = ? AND ID_UNIQUE = ? AND DATA_S2 = ?"

pars <- list("001G", "CERROCOROADO_001G","2021-04-02")

pred_attack_BQ_FARM  <- dbGetQuery(bq_conn, statement, params = pars)

CodePudding user response:

Sorry @KU99 but your solution doesn't return the mean values as I expected. Now, I try something new like an object creation for my mean operation and despite some ugly results with REPLICATE(DATE()), now works. The solution is:

# Aggregate CD mean by CD_TALHAO, ID_UNIQUE and DATA_S2
sqlInput_pred_FARM <- glue::glue_sql("SELECT REPLICATE(CD_TALHAO,1) AS TALHAO, REPLICATE(ID_UNIQUE,1) AS ID, REPLICATE(DATE(DATA_S2),1) AS DATE, AVG(CD) AS CD FROM stands_ds GROUP BY CD_TALHAO = {x},ID_UNIQUE = {y}, DATA_S2 = {z} ORDER BY CD_TALHAO = {x},ID_UNIQUE = {y}, DATA_S2 = {z}",
                                      x = "001G", y = "CERROCOROADO_001G", 
                                      z = "2021-04-02",.con=bq_conn)
pred_attack_BQ_FARM  <- dbGetQuery(bq_conn, as.character(sqlInput_pred_FARM, stringsAsFactors = T))
pred_attack_BQ_FARM 
#  TALHAO                ID        DATE       CD
#1   001C CERROCOROADO_001C -4661-02-24 49.93823
#2   001G CERROCOROADO_001G -4661-02-24 50.12102
  • Related