I've got a mysql table defined as follows:
CREATE TABLE series_info (
series_id INT NOT NULL PRIMARY KEY,
s_area VARCHAR(40) NOT NULL,
s_code VARCHAR(10) NOT NULL,
dataset_version VARCHAR(20) NOT NULL
);
In my Python module, I'm trying to save a Pandas DataFrame into the table with the following code:
.
. insert 3 records in df_series_info (see output)
.
conn = sqlalchemy.create_engine( \
f"mysql mysqlconnector://{DATABASER_USER}:{DATABASER_PASSWORD}@{DATABASE_HOST}/{DATABASE}")
print(df_series_info.dtypes)
print(df_series_info.head())
df_series_info.to_sql(name='series_info', con=conn, if_exists='append', index=False)
But I get the following output:
series_id int64
s_area object
s_code object
dataset_version object
dtype: object
series_id s_area s_code dataset_version
0 7179385449897108 AFRG 170202 2022-12-29-SYN-01
0 6756567457160340 AFRG 170203 2022-12-29-SYN-01
0 7692362580057236 AFRG 160305 2022-12-29-SYN-01
Traceback (most recent call last):
File "D:\Projects\AIGreenFleet\dev\aigreenfleet-series-data-generator\.venv\lib\site-packages\mysql\connector\connection_cext.py", line 565, in cmd_query
self._cmysql.query(
_mysql_connector.MySQLInterfaceError: Duplicate entry '2147483647' for key 'PRIMARY'
.
.
.
The above exception was the direct cause of the following exception:
.
.
.
sqlalchemy.exc.IntegrityError: (mysql.connector.errors.IntegrityError) 1062 (23000): Duplicate entry '2147483647' for key 'PRIMARY'
[SQL: INSERT INTO series_info (series_id, s_area, s_code, dataset_version) VALUES (%(series_id)s, %(s_area)s, %(s_code)s, %(dataset_version)s)]
[parameters: ({'series_id': '2952743774805140', 's_area': 'AFRG', 's_code': 170202, 'dataset_version': '2022-12-29-SYN-01'}, {'series_id': '9080550407298196', 's_area': 'AFRG', 's_code': 170203, 'dataset_version': '2022-12-29-SYN-01'}, {'series_id': '2972479528656020', 's_area': 'AFRG', 's_code': 160305, 'dataset_version': '2022-12-29-SYN-01'})]
(Background on this error at: https://sqlalche.me/e/14/gkpj)
Now, as you see from the output of print(df_series_info.head())
, the value in the series_id field is definetly not 2147483647.
If I remove the PRIMARY KEY
clause from the table creation statement, the code works (but puts '2147483647' as value for series_id in all rows).
It seems that pandas or the mysql connector are converting the value of the series_id column prior to saving it to the database.
Any idea what I might be doing wrong?
CodePudding user response:
The problem is that your series_id
values are too high for storing them in an INT column in MySql (the max value is 2147483647
). You should change your create statement to modify the series_id
data type to BIGINT:
CREATE TABLE series_info (
series_id BIGINT NOT NULL PRIMARY KEY,
s_area VARCHAR(40) NOT NULL,
s_code VARCHAR(10) NOT NULL,
dataset_version VARCHAR(20) NOT NULL
);