Home > Software engineering >  How to create a tuple of row[colname] from df.iterrows() based off a dictionary
How to create a tuple of row[colname] from df.iterrows() based off a dictionary

Time:12-31

I'm trying to create a tuple that is based on a dictionary of df column names to pass through my cursor when dumping data into a database.

Imagine a df with 3 columns:

colA   colB   colC
1awf   pref   02-03-2021   
2fef   cpvd   02-02-2021
3ace   pfva   02-05-2021
4cat   raof   01-03-2021
5agt   btoa   04-01-2021

And a dictionary that I created to define the data types of each column in the df:

coldict = 
{'colA':'varchar(4)',
'colB':'varchar(4)',
'colC':'date'}

Normally the cursor statement I'd use to dump this dataframe into a database table (named table_A in here) would look like this:

for i,r in df.iterrows():
    cursor.execute(
    'INSERT INTO table_A(
    [colA],[colB],[colC]) VALUES (?,?,?)',
    (row['colA'],row['colB'],row['colC']))

Now imagine if the table had 40 columns instead. That's a lot of typing! I'm wondering if it's possible to make this particular statement more dynamic in a function, where I'd pass the dictionary and it would loop through it to create the same 2 parts of the cursor.execute statement. The INSERT statement is straightforward enough, and I just looped thru the dictionary keys of coldict for that and concatenated the strings. The number of question marks to pass through is also pretty easy since it's just the length of the dictionary keys. However, I'm struggling to do the same for the second part - that is (row['colA'],row['colB'],row['colC'])... is that possible? The final desired output would look like this:

def datadump(df, insert_statement, rep_qs, df_cols):
   for i,r in df.iterrows():
       cursor.execute(
       insert_statement) VALUES (rep_qs)',
       df_cols)

where I just need a way to dynamically create the tuple that is df_cols. Hope I'm making sense here! Thank you!

CodePudding user response:

Perhaps this will work for your use case. See how to simplify the so many question mark in the insert string ? as reference

data is a list of tuples for columns taken from the dictionary you've defined

data = list(map(tuple, df[coldict.keys()].values))

[('1awf', 'pref', '02-03-2021'),
 ('2fef', 'cpvd', '02-02-2021'),
 ('3ace', 'pfva', '02-05-2021'),
 ('4cat', 'raof', '01-03-2021'),
 ('5agt', 'btoa', '04-01-2021')]

parameters = ", ".join(["?"] * len(coldict.keys()))
query = 'INSERT INTO the_table VALUES(%s);' % parameters
con.executemany(query, data)

CodePudding user response:

Series.to_numpy after iterrows should be able to do it.

Consider the code below:

import numpy as np
import pandas as pd

coldict = {'colA':'varchar(4)',
'colB':'varchar(4)',
'colC':'date'}

df = pd.DataFrame(data=[['1awf', 'pref', '02-03-2021'],['2fef', 'cpvd', '02-02-2021'],['3ace', 'pfva', '02-05-2021'],['4cat', 'raof', '01-03-2021'],['5agt', 'raof', '01-03-2021']], columns=['colA', 'colB', 'colC'])

def format_ins_stmt(table_name, coldict):
    k = list(coldict.keys())
    colliststr = ""
    rep_qs = ""
    for col in k:
        colliststr = colliststr   "["   col   "],"
        rep_qs = rep_qs   "?,"

    return "INSERT INTO "   table_name   "("   colliststr[:-1]   ") VALUES ("   rep_qs[:-1]   ")"

def datadump(df, insert_statement):
   for i,r in df.iterrows():
       lst = r.to_numpy()
#       cursor.execute(
#       insert_statement,
#       lst)

datadump(df, format_ins_stmt('tabA', coldict))

  • Related