Home > Blockchain >  pandas df.to_sql to Oracle database datatype inconsistency
pandas df.to_sql to Oracle database datatype inconsistency

Time:10-24

Problem

I have a pandas dataframe and I'm trying to use the pd.df.to_sql() function to an Oracle database. My Oracle database is 19.3c. Seems easy enough right? Why won't it work??

I saw in a few other another stackoverflow posts that I should be using sqlalchemy datatypes. Okay. Links:

from sqlalchemy.types import Integer, String
from sqlalchemy.dialects.oracle import NUMBER, VARCHAR2, DATE

oracle_dtypes = {
        'id' : NUMBER(38,0),
        'counts' : Integer,
        'name' : VARCHAR2(50),
        'swear_words' : String(9999)
        'date' : DATE()   
}

df_upload.to_sql(
    "oracle_table", 
    db.engine, 
    schema="SA_COVID", 
    if_exists="replace", 
    index=False
    dtype=oracle_dtypes
)

It never fails to convert random groups to CLOB or some other random datatypes. What should I do?

Things i've tried

I've tried and didn't work...

  • trucating (sending a SQL statement to the db from python) the table first then if_exist="append"
  • using the if_exist="replace"
  • using Oracle specific dialect of sqlalchemy datatypes only
  • using the generic sqlalchmey datatypes only
  • using a mix of both just bc I'm frustrated

Maybe it's an Oracle specific issue?

Things I haven't tried:

Things I haven't tried:

  • Dropping table and just recreating it before insert
  • to_sql adhoc and the send a series of some ALTER TABLE tbl_name MODIFY col_name

Related Links:

CodePudding user response:

I faced a similar issue when I was using df.to_sql

import sqlalchemy as sa
df_upload.to_sql(
    "oracle_table", 
    db.engine, 
    schema="SA_COVID", 
    if_exists="replace", 
    index=False
    dtype=oracle_dtypes
)

Change your dtypes like this:

oracle_dtypes = {
        'id' : sa.types.NUMBER(38,0),
        'counts' : sa.types.Integer,
        'name' : sa.types.VARCHAR2(50),
        'swear_words' : sa.types.String(9999)
        'date' : sa.types.DATE()   
}

CodePudding user response:

Turns out I needed to double check the incoming datatypes from the API into my pandas dataframe (made a dumb assumption the data was clean)... df.info really helped. I used the pd.Int64Dtype() for integer columns with null values, and 'datetime64[ns]' for datetimes.

  • Related