Pandas - CSV to SQL database-CodePudding

I'm really new to all this and I'm currently working on something that is stretching my skill and understanding so I'd appreciate some help please.

I have a large csv file that I want to read into a pandas data frame and then push the contents to an already created mariadb database made up of multiple entities. Currently, I'm trying to attempt this with the to_sql() function. Running the script seems not to actually push any data from the dataframe to the entity in the database as it remains empty after run.

FYI the dataframe contains multiple columns, only some of which I want to push to the relevant entity.

Any pointers greatly appreciated!

conn = sqlalchemy .create_engine("mysql://root:@127.0.0.1:3306")  
cur = conn.cursor()

cur.execute(databse = "pollution")

sites= """CREATE TABLE IF NOT EXISTS `sites`
  (`siteID` INT(20) NOT NULL,
  `Location` VARCHAR(48) NULL,
  `geo_point_2d` VARCHAR(150) NULL,
  PRIMARY KEY (`siteID`))"""

cur.execute(sites)

df = pd.read_csv("data.csv", sep=",", parse_dates ["DateTime","DateStart","DateEnd"],low_memory=False)

df.to_sql('sites', conn, if_exists='append',index=False)
conn.commit()

CodePudding user response：

Try to commit your execution after creating the table.

cur.execute(sites)

conn.commit

CodePudding user response：

It looks like it might be failing because you are trying to insert columns that are not part of the schema as described in sites -> "DateTime","DateStart","DateEnd". If that's the case, include those columns in the sites string. Otherwise, you'll need to subset your data frame.

If you can't get this to work with the pd.to_sql() command, here's a workaround. You'll need to create a Table object. If it's an already existing database table make sure to set the 'autoload_with' = to your engine (which I think is 'conn' for you).

Convert the dataframe into a dictionary that is oriented as a list of records (a list of dictionaries), each row will be its own dictionary. Each key will be the column name, and the value will be the specific row value that corresponds with the column.

Then you can simply just insert the list of dictionaries into the database.

engine = sqlalchemy .create_engine("mysql://root:@127.0.0.1:3306")  
cur = engine.cursor()

cur.execute(databse = "pollution")

sites= """CREATE TABLE IF NOT EXISTS `sites`
  (`siteID` INT(20) NOT NULL,
  `Location` VARCHAR(48) NULL,
  `geo_point_2d` VARCHAR(150) NULL,
  PRIMARY KEY (`siteID`))"""

cur.execute(sites)


df = pd.read_csv("data.csv", sep=",", parse_dates ["DateTime","DateStart","DateEnd"],low_memory=False)

# specify which columns you want, as a string, and within the double brackets.
df = df[['siteID,'Location','geo_point_2d']]

"""or if you have too many columns to manually put in, but have a specific range in mind.
this is telling pandas to get the first four columns
when you see this -> : -> it means start:stop """

df = df.iloc[,:4]

metadata_obj = MetaData()
records = df.to_dict(orient='records')

#note rather than creating a sql string you can also create the Table with a table object, checkout the sqlalchemy tutorial I linked

sites = Table('sites', metadata_obj, autoload_with =engine)



with engine.connect() as conn:
   conn.execute(sites.insert(),records)

I also think it would help if you read through the sqlalchemy-core documentation. This tutorial will show you how to create a Table object with the columns you want.