I am running the following Python code (simplified):
db = mysql.connector.connect(**dbconfig)
cur = db.cursor()
for row in dataFrame:
cur.execute(INSERT [TABLE NAME] column, VALUES ({row}))
db.commit() # should this be inside or outside the loop?
My question is will I expect a difference in performance based on whether the .commit() statement is inside the loop (runs each iteration) or outside (runs after all .execute statements).
Both ways seem to work fine and I cannot notice any difference in performance for small amounts of data (< 100 rows). Eventually, I will be processing many more rows and so I want to get this right now as I expect its going to make a difference.
CodePudding user response:
It depends.
COMMIT
takes some time; more commits = more time taken.
On the other hand, the more things you put into a single transaction, the longer that transaction takes and the more time that it interferes with other processes. And the greater chance of a deadlock.
Mor most situations, group statements together that need to be "atomically" run or fail. That is what transactions are for. If you end up with performance or deadlock problems, then we should look into the details to decide what to do -- on a case by case.