I am running python 3.10, pandas 1.5.2, sqlalchemy 1.4.46, and pyodbc 4.0.35 on an Ubuntu subsystem on Windows (WSL2).
I think the issue I am having was a bug in prior pandas versions, but everyone is saying it has been fixed, so I am not sure why I am having this issue.
I have a dataframe like this: df=
int_col_1 | int_col_2 | string_col |
---|---|---|
1 | 10 | 'val1' |
2 | 20 | None |
3 | None | 'val3' |
Then I push this to an MSSQL database using this code:
connection_string = 'DRIVER={ODBC Driver 18 for SQL Server};SERVER=' server ';DATABASE=' database ';UID=' username ';PWD=' password ';Encrypt=no'
connection_url = URL.create("mssql pyodbc", query={"odbc_connect": connection_string})
engine = create_engine(connection_url)
df.to_sql('table', engine, if_exists='replace', index=False)
Or with dtypes like this:
from sqlalchemy.types import Integer, String
df.to_sql('table', engine, if_exists='replace', index=False, dtype={'int_col1_1':Integer(), 'int_col_2':Integer(), 'string_col':String()})
Either way this is what is returned if I pull the table from SQL:
int_col_1 | int_col_2 | string_col |
---|---|---|
1 | 10.0 | 'val1' |
2 | 20.0 | None |
3 | NaN | 'val3' |
You'll notice that int_col_1 is fine (int64
), string_col is fine (object with NoneType
), but int_col_2 is turned into float64
.
I know there are methods to correct the data after pulling from SQL, but I am looking for a way to get the data to be correct in SQL when pushed from pandas and I have no idea how to do that. Any help would be really appreciated.
CodePudding user response:
You can cast the column to an Int64
dtype
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, np.nan]})
df['a'] = df['a'].astype('Int64')
df
a
0 1
1 2
2 <NA>