I'm trying to create a tuple that is based on a dictionary of df column names to pass through my cursor when dumping data into a database.
Imagine a df with 3 columns:
colA colB colC
1awf pref 02-03-2021
2fef cpvd 02-02-2021
3ace pfva 02-05-2021
4cat raof 01-03-2021
5agt btoa 04-01-2021
And a dictionary that I created to define the data types of each column in the df:
coldict =
{'colA':'varchar(4)',
'colB':'varchar(4)',
'colC':'date'}
Normally the cursor statement I'd use to dump this dataframe into a database table (named table_A in here) would look like this:
for i,r in df.iterrows():
cursor.execute(
'INSERT INTO table_A(
[colA],[colB],[colC]) VALUES (?,?,?)',
(row['colA'],row['colB'],row['colC']))
Now imagine if the table had 40 columns instead. That's a lot of typing! I'm wondering if it's possible to make this particular statement more dynamic in a function, where I'd pass the dictionary and it would loop through it to create the same 2 parts of the cursor.execute statement. The INSERT statement is straightforward enough, and I just looped thru the dictionary keys of coldict for that and concatenated the strings. The number of question marks to pass through is also pretty easy since it's just the length of the dictionary keys. However, I'm struggling to do the same for the second part - that is (row['colA'],row['colB'],row['colC'])... is that possible? The final desired output would look like this:
def datadump(df, insert_statement, rep_qs, df_cols):
for i,r in df.iterrows():
cursor.execute(
insert_statement) VALUES (rep_qs)',
df_cols)
where I just need a way to dynamically create the tuple that is df_cols. Hope I'm making sense here! Thank you!
CodePudding user response:
Perhaps this will work for your use case. See how to simplify the so many question mark in the insert string ? as reference
data
is a list of tuples for columns taken from the dictionary you've defined
data = list(map(tuple, df[coldict.keys()].values))
[('1awf', 'pref', '02-03-2021'),
('2fef', 'cpvd', '02-02-2021'),
('3ace', 'pfva', '02-05-2021'),
('4cat', 'raof', '01-03-2021'),
('5agt', 'btoa', '04-01-2021')]
parameters = ", ".join(["?"] * len(coldict.keys()))
query = 'INSERT INTO the_table VALUES(%s);' % parameters
con.executemany(query, data)
CodePudding user response:
Series.to_numpy after iterrows should be able to do it.
Consider the code below:
import numpy as np
import pandas as pd
coldict = {'colA':'varchar(4)',
'colB':'varchar(4)',
'colC':'date'}
df = pd.DataFrame(data=[['1awf', 'pref', '02-03-2021'],['2fef', 'cpvd', '02-02-2021'],['3ace', 'pfva', '02-05-2021'],['4cat', 'raof', '01-03-2021'],['5agt', 'raof', '01-03-2021']], columns=['colA', 'colB', 'colC'])
def format_ins_stmt(table_name, coldict):
k = list(coldict.keys())
colliststr = ""
rep_qs = ""
for col in k:
colliststr = colliststr "[" col "],"
rep_qs = rep_qs "?,"
return "INSERT INTO " table_name "(" colliststr[:-1] ") VALUES (" rep_qs[:-1] ")"
def datadump(df, insert_statement):
for i,r in df.iterrows():
lst = r.to_numpy()
# cursor.execute(
# insert_statement,
# lst)
datadump(df, format_ins_stmt('tabA', coldict))