I have a sqlite3 database:
import sqlite3
import pandas as pd
from datetime import datetime
batch_size = 10
con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
table = """CREATE TABLE my_table(DateCreated DATETIME);"""
cur.execute(table)
DateCreated = [datetime(2020, 12, 1, 1, 1)] * batch_size
x = list(pd.DataFrame({'DateCreated':DateCreated}).to_records(index=False))
query = """INSERT INTO my_table (DateCreated) values (?);"""
cur.executemany(query, x)
When I want to retrieve my datetimes, sqlite3 returns a byte string instead of a datetime object:
cur.execute("SELECT * FROM my_table;")
cur.fetchone()
>> (b'\x00\xf8\xea\x08\xf4qL\x16',)
What am I doing wrong? I have checked this issue but it is not really applicable for my problem. I also tried changing the sqlite datatype from DATETIME to TIMESTAMP, but received ValueError: not enough values to unpack (expected 2, got 1)
. I am using Python 3.8.10.
CodePudding user response:
There are two incorrectnesses in your snippet: The correct datatype for sqlite3 column is timestamp
, not datetime
. Sqlite3 is, well, peculiar, it will just accept anything and store everything internally as strings - it is the Python driver that will treat the column type "timestamp" differently, and able to accept and retrieve datetime objects directly.
That said, the "df.to_record" call does not help either, as it retrieves the pandas datetimes as strings. The correct thing is to retrieve each pandas cell value and call .to_pydatetime()
on it.
So, if you need to convert from another, not given in the example, dataframe to sqlite, this sort of code works:
import sqlite3
import pandas as pd
from datetime import datetime
batch_size = 10
con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
table = """CREATE TABLE my_table(DateCreated timestamp)"""
cur.execute(table)
DateCreated = [datetime(2020, 12, 1, 1, 1)] * batch_size
x = [(rec.to_pydatetime(),) for rec in pd.DataFrame({'DateCreated':DateCreated})["DateCreated"]]
query = """INSERT INTO my_table (DateCreated) values (?)"""
cur.executemany(query, x)
the way I wrote the x =
expression above will allow you to select the desired columns from your dataframe, and call .to_pydatetime()
only in the desired fields.
If your dataframe have 5 other colimns, and datetime is on the 3rd [index 2], this works:
x = [(*rec[0:2], rec[2].to_pydatetime(), *rec[3:]) for _, rec in df.iterrows()]
And, if you are not converting things from a pandas workflow at all, there is no need to meddle with it, just create datetime objects as tuples and insert them:
x = [(datetime.now(),) for i in range(10)]
CodePudding user response:
The issue lies with my use of pandas' .to_record()
method. If I cut that step out and supply a list of tuples instead of a list of numpy records, then the database returns datetimes properly.
import sqlite3
import pandas as pd
from datetime import datetime
batch_size = 10
con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()
table = """CREATE TABLE my_table(DateCreated DATETIME);"""
cur.execute(table)
DateCreated = [(datetime(2020, 12, 1, 1, 1),)] * batch_size
query = """INSERT INTO my_table (DateCreated) values (?);"""
cur.executemany(query, DateCreated)
cur.execute("SELECT * FROM my_table;")
cur.fetchall()