I've been developing a very simple webpage in recent days and I've decided to use SQLAlchemy and Flask as back-end stack.
I was trying to find a way to upsert data. I found session.merge
in a SO question and also my colleague tell me to use session.bulk_update_mappings
. However, both are so much slower than "query all -> manully updating all".
Table DDL:
create table if not exists testdb.table_datetime
(
id int auto_increment
primary key,
col1 varchar(50) null,
col2 varchar(50) null,
col3 varchar(50) null,
col4 varchar(50) null,
col5 varchar(50) null,
col6 varchar(50) null,
local_modified datetime null
)
collate=utf8mb4_bin;
My ORM model:
class DemoClass(Base):
__tablename__ = 'table_datetime'
local_modified = Column(DATETIME)
# __tablename__ = 'table_varchar'
# local_modified = Column(String)
id = Column(Integer, primary_key=True)
col1 = Column(String)
col2 = Column(String)
col3 = Column(String)
col4 = Column(String)
col5 = Column(String)
col6 = Column(String)
Code:
# Method 1: bulk update'
# Slow, 6~8s for 500 lines update
# with Session() as session:
# session.bulk_update_mappings(DemoClass, data)
# session.commit()
# Method 2: query all once, manully update and commit
# Fast, less than 1s
with Session() as session:
db_rows = session.query(DemoClass).all() # 500 records in all
for item in data:
for db_row in db_rows:
if item['id'] == db_row.id:
db_row.col1 = item['col1']
db_row.col2 = item['col2']
db_row.col3 = item['col3']
db_row.col4 = item['col4']
db_row.col5 = item['col5']
db_row.col6 = item['col6']
# db_row.local_modified = item['local_modified']
db_row.local_modified = datetime.strptime(item['local_modified'], '%Y-%m-%d %H:%M:%S')
break
session.commit()
# Method 3 construct ORM class and session merge each, then commit
# Slow, 6~8s for 500 lines update
# try:
# items = [DemoClass(id=d['id'], col1=d['col1'], col2=d['col2'],col3=d['col3'],
# col4=d['col4'],col5=d['col5'],col6=d['col6']
# ) for d in data]
# except KeyError as e:
# return {'status_code': 400, 'error_msg': f"update failed, JSON payload missing key: {e}"}, 400
# try:
# with Session() as session:
# for m in items:
# # print(3) # merge is very slow
# session.merge(m)
# session.commit()
# # print(4)
# except exc.SQLAlchemyError as e:
# return {'status_code': 500, 'error_msg': 'server error, please contact admin'}, 500
# # print(5)
My test environment:
Flask==2.0.1
PyMySQL==1.0.2
MySQL==1.0.2
SQLAlchemy==1.4.25
Python 3.6.5 Windows
MySQL remote server, Server version 5.7.33-0ubuntu0.18.04.1
- Method 1 and 3 : 6~8s for 500 lines update
- Method 2 : <1s for 500 lines update
Complete test code and setup is available here: https://github.com/ajfg93/sqlalchemy-demo/tree/datetime_slow
Why session.merge
and session.bulk_update_mappings
are so much slower? From the echoed message printed by SQLAlchemy, I can tell that session.merge
emit many select
, I guess that's reason. But I don't know the reason for bulk_update_mappings
.
By the way, what exactly is the correct way to upsert in SQLAlchemy? I Googled it but it lead me to session.merge
which failed me. Is the "Method 2" above the correct way?
CodePudding user response:
The reason for the time difference sems to be the same as in your other question, https://stackoverflow.com/a/69411298/7589014, namely that no change is persisted to db since no change is found to the objects
CodePudding user response:
I rewrote some code and ensure that every time the post data is different, so as to ensure SQLAlchemy would do updating.
Then I tested it again with a local MySQL and all reponse time <1s.
So the real problem turned out to be that I was connecting to remote MySQL (public network) before. I didn't expect network IO contributes so much overhead to the whole processing.
FYI, benchmark result:
bulk_update_mappings
: ~80ms- Method 2: ~150ms and passing a datetime string doesn't add much overhead
session.merge
: 500ms~, the slowest, as expected.