Home > Software engineering >  How to create multiple dataframes in R from different sql files
How to create multiple dataframes in R from different sql files

Time:12-24

I have 49 .db files. I want to open them in R and then store its content in a dataframe for further use. I am able to do it for one file but I want to modify the code to be able to do it for all the 49 .db file in one go. This is the code that I am trying to do for one file:

sqlite <- dbDriver("SQLite")
dbname <- "en_Whole_Blood.db" 
db = dbConnect(sqlite,dbname)
wholeblood_df <- dbGetQuery(db,"SELECT * FROM weights") 
View(wholeblood_df)

I tried to use the list.files function to do it for all the .db file but its not happening.Its only creating a dataframe for the last object This is the code for it:

library("RSQLite")
sqlite <- dbDriver("SQLite")
sqlite <- dbDriver("SQLite")
dbname <- data_files
dbname
for (i in length(dbname){
db=dbConnect(sqlite,dbname[i])
df <- dbGetQuery(db,"SELECT * FROM weights")
}

##This only gives me last .db file as a dataframe. Does anyone know how can I edit this code to get 49 dataframe for each sql file. Thank you.

CodePudding user response:

Try replacing the for loop with lapply:

list_of_df <- lapply(dbname, function(x) {
db <- dbConnect(sqlite, x)
df <- dbGetQuery(db, "SELECT * FROM weights")
})

I'm not experience in handling SQL and / or connections, but I think it might work.

Edit

Second alternative maintaining the for loop:

df <- list()
for (i in 1:length(dbname)) {
db <- dbConnect(sqlite,dbname[i])
df <- c(df, dbGetQuery(db,"SELECT * FROM weights"))
}

Hope it helps

CodePudding user response:

Another suggestion:

files <- list.files(pattern = "\\.db$")
list_of_frames <- lapply(files, function(fn) {
  db <- dbConnect(RSQLite::SQLite(), fn)
  on.exit(dbDisconnect(db))
  dbGetQuery(db, "select * from weights")
})
oneframe <- do.call(rbind, list_of_frames)

Reproducible example

Create data (you don't need this):

for (i in 1:3) {
  db <- DBI::dbConnect(RSQLite::SQLite(), sprintf("mtcars%i.db", i))
  DBI::dbWriteTable(db, "weights", mtcars[i * 5   1:3,], append = FALSE, create = TRUE)
  DBI::dbDisconnect(db)
}

Working solution:

files <- list.files(pattern = "\\.db$")
files
# [1] "mtcars1.db" "mtcars2.db" "mtcars3.db"
list_of_frames <- lapply(files, function(fn) {
  db <- dbConnect(RSQLite::SQLite(), fn)
  on.exit(dbDisconnect(db))
  dbGetQuery(db, "select * from mt")
})
list_of_frames
# [[1]]
#    mpg cyl  disp  hp drat   wt  qsec vs am gear carb
# 1 18.1   6 225.0 105 2.76 3.46 20.22  1  0    3    1
# 2 14.3   8 360.0 245 3.21 3.57 15.84  0  0    3    4
# 3 24.4   4 146.7  62 3.69 3.19 20.00  1  0    4    2
# [[2]]
#    mpg cyl  disp  hp drat   wt qsec vs am gear carb
# 1 17.8   6 167.6 123 3.92 3.44 18.9  1  0    4    4
# 2 16.4   8 275.8 180 3.07 4.07 17.4  0  0    3    3
# 3 17.3   8 275.8 180 3.07 3.73 17.6  0  0    3    3
# [[3]]
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# 1 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
# 2 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
# 3 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

oneframe <- do.call(rbind, list_of_frames)
oneframe
#    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# 1 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
# 2 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
# 3 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
# 4 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
# 5 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
# 6 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
# 7 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
# 8 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
# 9 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1

Tidyverse alternative:

library(dplyr) # just for %>%, could use magrittr as well
library(purrr) # map_dfr
oneframe <- files %>%
  map_dfr(~ {
    db <- DBI::dbConnect(RSQLite::SQLite(), .)
    on.exit(DBI::dbDisconnect(db))
    DBI::dbGetQuery(db, "select * from mt")
  })
### same result
  • Related