How to write pandas date column to Databricks SQL database-CodePudding

I have pandas dataframe column that has string values in the format YYYY-MM-DD HH:MM:SS:mmmmmmm, for example 2021-12-26 21:10:18.6766667. I have verified that all values are in this format where milliseconds are in 7 digits. But the following code throws conversion error (shown below) when it tries to insert data into an Azure Databricks SQL database:

Conversion failed when converting date and/or time from character string

Question: What could be a cause of the error and how can we fix it?

Remark: After conversion the initial value (for example 2021-12-26 21:10:18.6766667) even adds two more digits at the end to make it 2021-12-26 21:10:18.676666700 - with 9 digits milliseconds.

import sqlalchemy as sq
import pandas as pd
import datetime

data_df = pd.read_csv('/dbfs/FileStore/tables/myDataFile.csv', low_memory=False, quotechar='"', header='infer')

data_df['OrderDate'] = data_df['OrderDate'].astype('datetime64[ns]')

data_df.to_sql(name='CustomerOrderTable', con=engine, if_exists='append', index=False, dtype={'OrderID' : sq.VARCHAR(10), 
   'Name' : sq.VARCHAR(50),
   'OrderDate' : sq.DATETIME()})

CodePudding user response：

Keep the dates as plain strings without converting to_datetime.

This is because DataBricks SQL is based on SQLite, and SQLite expects date strings:

In the case of SQLite, date and time types are stored as strings which are then converted back to datetime objects when rows are returned.

If the raw date strings still don't work, convert them to_datetime and reformat into a safe format using dt.strftime:

df['OrderDate'] = pd.to_datetime(df['OrderDate']).dt.strftime('%Y-%m-%d %H:%M:%S.%f').str[:-3]

Or if the column is already datetime, use dt.strftime directly:

df['OrderDate'] = df['OrderDate'].dt.strftime('%Y-%m-%d %H:%M:%S.%f').str[:-3]