Home > Software engineering >  Pandas to_sql int column turning to float64 because of null values
Pandas to_sql int column turning to float64 because of null values

Time:01-25

I am running python 3.10, pandas 1.5.2, sqlalchemy 1.4.46, and pyodbc 4.0.35 on an Ubuntu subsystem on Windows (WSL2).

I think the issue I am having was a bug in prior pandas versions, but everyone is saying it has been fixed, so I am not sure why I am having this issue.

I have a dataframe like this: df=

int_col_1 int_col_2 string_col
1 10 'val1'
2 20 None
3 None 'val3'

Then I push this to an MSSQL database using this code:

connection_string = 'DRIVER={ODBC Driver 18 for SQL Server};SERVER='   server   ';DATABASE='   database   ';UID='   username   ';PWD='   password   ';Encrypt=no'
connection_url = URL.create("mssql pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
df.to_sql('table', engine, if_exists='replace', index=False)

Or with dtypes like this:

from sqlalchemy.types import Integer, String
df.to_sql('table', engine, if_exists='replace', index=False, dtype={'int_col1_1':Integer(), 'int_col_2':Integer(), 'string_col':String()})

Either way this is what is returned if I pull the table from SQL:

int_col_1 int_col_2 string_col
1 10.0 'val1'
2 20.0 None
3 NaN 'val3'

You'll notice that int_col_1 is fine (int64), string_col is fine (object with NoneType), but int_col_2 is turned into float64.

I know there are methods to correct the data after pulling from SQL, but I am looking for a way to get the data to be correct in SQL when pushed from pandas and I have no idea how to do that. Any help would be really appreciated.

CodePudding user response:

You can cast the column to an Int64 dtype

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': [1, 2, np.nan]})
df['a'] = df['a'].astype('Int64')
df
    a
0   1
1   2
2   <NA>
  • Related