sqlite3 returns byte string for datetime-CodePudding

I have a sqlite3 database:

import sqlite3
import pandas as pd
from datetime import datetime

batch_size = 10

con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()

table = """CREATE TABLE my_table(DateCreated DATETIME);"""
cur.execute(table)

DateCreated = [datetime(2020, 12, 1, 1, 1)] * batch_size
x = list(pd.DataFrame({'DateCreated':DateCreated}).to_records(index=False))

query = """INSERT INTO my_table (DateCreated) values (?);"""

cur.executemany(query, x)

When I want to retrieve my datetimes, sqlite3 returns a byte string instead of a datetime object:

cur.execute("SELECT * FROM my_table;")
cur.fetchone()

>> (b'\x00\xf8\xea\x08\xf4qL\x16',)

What am I doing wrong? I have checked this issue but it is not really applicable for my problem. I also tried changing the sqlite datatype from DATETIME to TIMESTAMP, but received ValueError: not enough values to unpack (expected 2, got 1). I am using Python 3.8.10.

CodePudding user response：

There are two incorrectnesses in your snippet: The correct datatype for sqlite3 column is timestamp, not datetime. Sqlite3 is, well, peculiar, it will just accept anything and store everything internally as strings - it is the Python driver that will treat the column type "timestamp" differently, and able to accept and retrieve datetime objects directly.

That said, the "df.to_record" call does not help either, as it retrieves the pandas datetimes as strings. The correct thing is to retrieve each pandas cell value and call .to_pydatetime() on it.

So, if you need to convert from another, not given in the example, dataframe to sqlite, this sort of code works:

import sqlite3
import pandas as pd
from datetime import datetime

batch_size = 10

con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()

table = """CREATE TABLE my_table(DateCreated timestamp)"""
cur.execute(table)

DateCreated = [datetime(2020, 12, 1, 1, 1)] * batch_size
x = [(rec.to_pydatetime(),)  for rec in  pd.DataFrame({'DateCreated':DateCreated})["DateCreated"]]

query = """INSERT INTO my_table (DateCreated) values (?)"""
cur.executemany(query, x)

the way I wrote the x = expression above will allow you to select the desired columns from your dataframe, and call .to_pydatetime() only in the desired fields.

If your dataframe have 5 other colimns, and datetime is on the 3rd [index 2], this works:

x = [(*rec[0:2], rec[2].to_pydatetime(), *rec[3:])  for _, rec in df.iterrows()]

And, if you are not converting things from a pandas workflow at all, there is no need to meddle with it, just create datetime objects as tuples and insert them:

x = [(datetime.now(),) for i in range(10)]

CodePudding user response：

The issue lies with my use of pandas' .to_record() method. If I cut that step out and supply a list of tuples instead of a list of numpy records, then the database returns datetimes properly.

import sqlite3
import pandas as pd
from datetime import datetime

batch_size = 10

con = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES)
cur = con.cursor()

table = """CREATE TABLE my_table(DateCreated DATETIME);"""
cur.execute(table)

DateCreated = [(datetime(2020, 12, 1, 1, 1),)] * batch_size

query = """INSERT INTO my_table (DateCreated) values (?);"""

cur.executemany(query, DateCreated)

cur.execute("SELECT * FROM my_table;")

cur.fetchall()