Home > OS >  Trying to read sqlite database to Dask dataframe
Trying to read sqlite database to Dask dataframe

Time:04-22

I am trying to read a table from a sqlite database in kaggle using Dask,

link to DB : https://www.kaggle.com/datasets/marcilonsilvacunha/amostracnpj?select=amostraCNPJ.sqlite some of the tables in this database are really large and I want to test how dask can handle them. I wrote the following code for one of the tables in the smaller sqlite database :

import dask.dataframe as ddf
import sqlite3

# Read sqlite query results into a pandas DataFrame
con = sqlite3.connect("/kaggle/input/amostraCNPJ.sqlite")
df = ddf.read_sql_table('cnpj_dados_cadastrais_pj', con, index_col='cnpj')  

# Verify that result of SQL query is stored in the dataframe
print(df.head())

this gives an error:

AttributeError: 'sqlite3.Connection' object has no attribute '_instantiate_plugins'

any help would be apreciated as this is the first time I use Dask to read sqlite.

CodePudding user response:

As the docstring stated, you should not pass a connection object to dask. You need to pass a sqlalchemy compatible connection string

df = ddf.read_sql_table('cnpj_dados_cadastrais_pj',
    'sqlite:////kaggle/input/amostraCNPJ.sqlite', index_col='cnpj')  
  • Related