Home > Software engineering >  Create a dataset for Pandas, from sql, in a more performant way without having to use setdefault and
Create a dataset for Pandas, from sql, in a more performant way without having to use setdefault and

Time:01-08

I create the pandas DataFrame like this. I don't take it directly from the database, but first create a loop with a setdefault and append. The reason is that in sql i use an Inner Join and have to add separately with append row[5].

Next i use the dataset inside pandas.

Is there a more performant way to create the dataset without using setdefault and append? Or is the code I'm using already performant?

newlist = {}

conn = sqlite3.connect('....')
cursor = conn.cursor()

x = cursor.execute('''SQL CODES WITH INNER JOIN''')

for row in x.fetchall():
    newlist.setdefault((row[0],row[1],row[2], row[3], row[4]), []).append(row[5])

# Transform dataset to DataFrame
df = pd.DataFrame.from_dict(newlist, orient='index')

CodePudding user response:

I don't know about your SQL query (and I don't have material to test) but maybe the easiest way is to use pd.read_sql.

Something like:

conn = sqlite3.connect('....')
qs = '''SQL CODES WITH INNER JOIN'''

df = pd.read_sql(qs, conn)

CodePudding user response:

One way to improve the performance of this code would be to use a list of tuples to store the data from the database query, and then use pandas.DataFrame() to create the DataFrame from the list of tuples. This will avoid the overhead of using a dictionary and the setdefault() and append() methods.

Here's how you could modify your code to do this:

data = []

conn = sqlite3.connect('....')
cursor = conn.cursor()

x = cursor.execute('''SQL CODES WITH INNER JOIN''')

for row in x.fetchall():
    data.append((row[0],row[1],row[2], row[3], row[4], row[5]))

# Create DataFrame from list of tuples
df = pd.DataFrame(data, columns=['col1', 'col2', 'col3', 'col4', 'col5', 'col6'])

This should be more performant than the original code, since it avoids the overhead of using a dictionary and the setdefault() and append() methods.

  • Related