Problem
I have a pandas dataframe and I'm trying to use the pd.df.to_sql()
function to an Oracle database. My Oracle database is 19.3c
. Seems easy enough right? Why won't it work??
I saw in a few other another stackoverflow posts that I should be using sqlalchemy datatypes. Okay. Links:
- Pandas and SQL Alchemy: Specify Column Data Types
- Pandas to_sql changing datatype in database table
- https://docs.sqlalchemy.org/en/14/dialects/oracle.html#oracle-data-types
from sqlalchemy.types import Integer, String
from sqlalchemy.dialects.oracle import NUMBER, VARCHAR2, DATE
oracle_dtypes = {
'id' : NUMBER(38,0),
'counts' : Integer,
'name' : VARCHAR2(50),
'swear_words' : String(9999)
'date' : DATE()
}
df_upload.to_sql(
"oracle_table",
db.engine,
schema="SA_COVID",
if_exists="replace",
index=False
dtype=oracle_dtypes
)
It never fails to convert random groups to CLOB
or some other random datatypes. What should I do?
Things i've tried
I've tried and didn't work...
- trucating (sending a SQL statement to the db from python) the table first then
if_exist="append"
- using the
if_exist="replace"
- using Oracle specific dialect of
sqlalchemy
datatypes only - using the generic
sqlalchmey
datatypes only - using a mix of both just bc I'm frustrated
Maybe it's an Oracle specific issue?
Things I haven't tried:
Things I haven't tried:
- Dropping table and just recreating it before insert
to_sql
adhoc and the send a series of someALTER TABLE tbl_name MODIFY col_name
Related Links:
CodePudding user response:
I faced a similar issue when I was using df.to_sql
import sqlalchemy as sa
df_upload.to_sql(
"oracle_table",
db.engine,
schema="SA_COVID",
if_exists="replace",
index=False
dtype=oracle_dtypes
)
Change your dtypes like this:
oracle_dtypes = {
'id' : sa.types.NUMBER(38,0),
'counts' : sa.types.Integer,
'name' : sa.types.VARCHAR2(50),
'swear_words' : sa.types.String(9999)
'date' : sa.types.DATE()
}
CodePudding user response:
Turns out I needed to double check the incoming datatypes from the API into my pandas dataframe (made a dumb assumption the data was clean)... df.info
really helped. I used the pd.Int64Dtype()
for integer columns with null values, and 'datetime64[ns]'
for datetimes.