I have 49 .db files. I want to open them in R and then store its content in a dataframe for further use. I am able to do it for one file but I want to modify the code to be able to do it for all the 49 .db file in one go. This is the code that I am trying to do for one file:
sqlite <- dbDriver("SQLite")
dbname <- "en_Whole_Blood.db"
db = dbConnect(sqlite,dbname)
wholeblood_df <- dbGetQuery(db,"SELECT * FROM weights")
View(wholeblood_df)
I tried to use the list.files function to do it for all the .db file but its not happening.Its only creating a dataframe for the last object This is the code for it:
library("RSQLite")
sqlite <- dbDriver("SQLite")
sqlite <- dbDriver("SQLite")
dbname <- data_files
dbname
for (i in length(dbname){
db=dbConnect(sqlite,dbname[i])
df <- dbGetQuery(db,"SELECT * FROM weights")
}
##This only gives me last .db file as a dataframe. Does anyone know how can I edit this code to get 49 dataframe for each sql file. Thank you.
CodePudding user response:
Try replacing the for loop with lapply
:
list_of_df <- lapply(dbname, function(x) {
db <- dbConnect(sqlite, x)
df <- dbGetQuery(db, "SELECT * FROM weights")
})
I'm not experience in handling SQL and / or connections, but I think it might work.
Edit
Second alternative maintaining the for loop:
df <- list()
for (i in 1:length(dbname)) {
db <- dbConnect(sqlite,dbname[i])
df <- c(df, dbGetQuery(db,"SELECT * FROM weights"))
}
Hope it helps
CodePudding user response:
Another suggestion:
files <- list.files(pattern = "\\.db$")
list_of_frames <- lapply(files, function(fn) {
db <- dbConnect(RSQLite::SQLite(), fn)
on.exit(dbDisconnect(db))
dbGetQuery(db, "select * from weights")
})
oneframe <- do.call(rbind, list_of_frames)
Reproducible example
Create data (you don't need this):
for (i in 1:3) {
db <- DBI::dbConnect(RSQLite::SQLite(), sprintf("mtcars%i.db", i))
DBI::dbWriteTable(db, "weights", mtcars[i * 5 1:3,], append = FALSE, create = TRUE)
DBI::dbDisconnect(db)
}
Working solution:
files <- list.files(pattern = "\\.db$")
files
# [1] "mtcars1.db" "mtcars2.db" "mtcars3.db"
list_of_frames <- lapply(files, function(fn) {
db <- dbConnect(RSQLite::SQLite(), fn)
on.exit(dbDisconnect(db))
dbGetQuery(db, "select * from mt")
})
list_of_frames
# [[1]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 18.1 6 225.0 105 2.76 3.46 20.22 1 0 3 1
# 2 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4
# 3 24.4 4 146.7 62 3.69 3.19 20.00 1 0 4 2
# [[2]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 17.8 6 167.6 123 3.92 3.44 18.9 1 0 4 4
# 2 16.4 8 275.8 180 3.07 4.07 17.4 0 0 3 3
# 3 17.3 8 275.8 180 3.07 3.73 17.6 0 0 3 3
# [[3]]
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 2 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 3 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
oneframe <- do.call(rbind, list_of_frames)
oneframe
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
# 2 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
# 3 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
# 4 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
# 5 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
# 6 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
# 7 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
# 8 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
# 9 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Tidyverse alternative:
library(dplyr) # just for %>%, could use magrittr as well
library(purrr) # map_dfr
oneframe <- files %>%
map_dfr(~ {
db <- DBI::dbConnect(RSQLite::SQLite(), .)
on.exit(DBI::dbDisconnect(db))
DBI::dbGetQuery(db, "select * from mt")
})
### same result