I'm trying to get to the bottom of what I thought would be a simple problem: exporting a dataframe in Pandas into a mysql database.
There is a scraper that collates data in pandas to save the csv format
**title, summary, url** #header
abc, summary, some_url
But I'd like to send the dataframe directly to a mysql database with the same three-column format.
My code so far is:
import mysql.connector
# Connect to the database
conn = mysql.connector.connect(user='root', password='somepassword', host='localhost', port='3306', database='db')
# Write the DataFrame to the database
df.to_sql(name='table_name', con=conn, if_exists='replace', index=False)
# Close the connection
conn.close()
But this returns an error message of:
pandas.errors.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': Not all parameters were used in the SQL statement
How do I fix this?
UPDATE:
I have read that I might have to use sqlalchemy but I really want to stick with pandas' solution if possible.
CodePudding user response:
You can still use pandas solution, but you have to use sqlalchemy.create_engine
instead of mysql.connector.connect
, since to_sql
expects "sqlalchemy.engine.(Engine or Connection)
or sqlite3.Connection
" as con
argument. See reference and examples there. This should be working fine:
import sqlalchemy
# Connect to the database
conn = sqlalchemy.create_engine(
'mysql mysqlconnector://root:somepassword@localhost:3306/db')
# Write the DataFrame to the database
df.to_sql(name='table_name', con=conn, if_exists='replace', index=False)
# Close the connection
conn.close()
CodePudding user response:
If you are not passing a SQLAlchemy connection, then to_sql assumes that you are using a SQLite backend. That is why you are receiving the error shown.
Do you have to use to_sql? You can use the mysql.connector directly. There are some examples in the mysql documentation here: https://dev.mysql.com/doc/connector-python/en/connector-python-example-cursor-transaction.html
You could iterate through your dataframe and insert the rows, like this:
import mysql.connector
conn = mysql.connector.connect(user='root', password='somepassword', host='localhost', port='3306', database='db')
cursor = conn.cursor()
for i,row in df.iterrows():
sql = "INSERT INTO table_name (title, summary, url) VALUES (%s,%s,%s)"
cursor.execute(sql, tuple(row))
conn.commit()
conn.close()
Ideally, if you want to do performant inserts using MySQL as the backend storage, use SQLAlchemy. Alternatively, use load data infile
in MySQL to import the csv directly.