How to use to_sql in pandas-CodePudding

I'm trying to get to the bottom of what I thought would be a simple problem: exporting a dataframe in Pandas into a mysql database.

There is a scraper that collates data in pandas to save the csv format

**title, summary, url** #header
  abc,   summary, some_url

But I'd like to send the dataframe directly to a mysql database with the same three-column format.

My code so far is:

import mysql.connector

# Connect to the database
conn = mysql.connector.connect(user='root', password='somepassword', host='localhost', port='3306', database='db')

# Write the DataFrame to the database
df.to_sql(name='table_name', con=conn, if_exists='replace', index=False)

# Close the connection
conn.close()

But this returns an error message of:

pandas.errors.DatabaseError: Execution failed on sql 'SELECT name FROM sqlite_master WHERE type='table' AND name=?;': Not all parameters were used in the SQL statement

How do I fix this?

UPDATE:

I have read that I might have to use sqlalchemy but I really want to stick with pandas' solution if possible.

CodePudding user response：

You can still use pandas solution, but you have to use sqlalchemy.create_engine instead of mysql.connector.connect, since to_sql expects "sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection" as con argument. See reference and examples there. This should be working fine:

import sqlalchemy

# Connect to the database
conn = sqlalchemy.create_engine(
'mysql mysqlconnector://root:somepassword@localhost:3306/db')

# Write the DataFrame to the database
df.to_sql(name='table_name', con=conn, if_exists='replace', index=False)

# Close the connection
conn.close()

CodePudding user response：

If you are not passing a SQLAlchemy connection, then to_sql assumes that you are using a SQLite backend. That is why you are receiving the error shown.

Do you have to use to_sql? You can use the mysql.connector directly. There are some examples in the mysql documentation here: https://dev.mysql.com/doc/connector-python/en/connector-python-example-cursor-transaction.html

You could iterate through your dataframe and insert the rows, like this:

import mysql.connector

conn = mysql.connector.connect(user='root', password='somepassword', host='localhost', port='3306', database='db')
cursor = conn.cursor()

for i,row in df.iterrows():
    sql = "INSERT INTO table_name (title, summary, url) VALUES (%s,%s,%s)"
    cursor.execute(sql, tuple(row))

conn.commit()
conn.close()

Ideally, if you want to do performant inserts using MySQL as the backend storage, use SQLAlchemy. Alternatively, use load data infile in MySQL to import the csv directly.