Home > Mobile >  Why session.merge and session.bulk_update_mappings are so much slower than "query all -> man
Why session.merge and session.bulk_update_mappings are so much slower than "query all -> man

Time:10-02

I've been developing a very simple webpage in recent days and I've decided to use SQLAlchemy and Flask as back-end stack.

I was trying to find a way to upsert data. I found session.merge in a SO question and also my colleague tell me to use session.bulk_update_mappings. However, both are so much slower than "query all -> manully updating all".


Table DDL:

create table if not exists testdb.table_datetime
(
    id int auto_increment
        primary key,
    col1 varchar(50) null,
    col2 varchar(50) null,
    col3 varchar(50) null,
    col4 varchar(50) null,
    col5 varchar(50) null,
    col6 varchar(50) null,
    local_modified datetime null
)
collate=utf8mb4_bin;

My ORM model:

class DemoClass(Base):
    __tablename__ = 'table_datetime'
    local_modified = Column(DATETIME)

    # __tablename__ = 'table_varchar'
    # local_modified = Column(String)
    id = Column(Integer, primary_key=True)
    col1 = Column(String)
    col2 = Column(String)
    col3 = Column(String)
    col4 = Column(String)
    col5 = Column(String)
    col6 = Column(String)

Code:

 # Method 1: bulk update'
# Slow, 6~8s for 500 lines update
# with Session() as session:
#     session.bulk_update_mappings(DemoClass, data)
#     session.commit()

# Method 2: query all once, manully update and commit
# Fast, less than 1s
with Session() as session:
    db_rows = session.query(DemoClass).all() # 500 records in all
    for item in data:
        for db_row in db_rows:
            if item['id'] == db_row.id:
                db_row.col1 = item['col1']
                db_row.col2 = item['col2']
                db_row.col3 = item['col3']
                db_row.col4 = item['col4']
                db_row.col5 = item['col5']
                db_row.col6 = item['col6']
                # db_row.local_modified = item['local_modified']
                db_row.local_modified = datetime.strptime(item['local_modified'], '%Y-%m-%d %H:%M:%S')
                break
    session.commit()

# Method 3 construct ORM class and session merge each, then commit
# Slow, 6~8s for 500 lines update
# try:
#     items = [DemoClass(id=d['id'], col1=d['col1'], col2=d['col2'],col3=d['col3'],
#                             col4=d['col4'],col5=d['col5'],col6=d['col6']
#                             ) for d in data]
# except KeyError as e:
#     return {'status_code': 400, 'error_msg': f"update failed, JSON payload missing key: {e}"}, 400
# try:
#     with Session() as session:
#         for m in items:
#             # print(3) # merge is very slow
#             session.merge(m)
#         session.commit()
#        # print(4)
# except exc.SQLAlchemyError as e:
#     return {'status_code': 500, 'error_msg': 'server error, please contact admin'}, 500
# # print(5)

My test environment:

Flask==2.0.1
PyMySQL==1.0.2
MySQL==1.0.2
SQLAlchemy==1.4.25
Python 3.6.5 Windows
MySQL remote server, Server version 5.7.33-0ubuntu0.18.04.1
  • Method 1 and 3 : 6~8s for 500 lines update
  • Method 2 : <1s for 500 lines update

Complete test code and setup is available here: https://github.com/ajfg93/sqlalchemy-demo/tree/datetime_slow


Why session.merge and session.bulk_update_mappings are so much slower? From the echoed message printed by SQLAlchemy, I can tell that session.merge emit many select, I guess that's reason. But I don't know the reason for bulk_update_mappings.

By the way, what exactly is the correct way to upsert in SQLAlchemy? I Googled it but it lead me to session.merge which failed me. Is the "Method 2" above the correct way?

CodePudding user response:

The reason for the time difference sems to be the same as in your other question, https://stackoverflow.com/a/69411298/7589014, namely that no change is persisted to db since no change is found to the objects

CodePudding user response:

I rewrote some code and ensure that every time the post data is different, so as to ensure SQLAlchemy would do updating.

Then I tested it again with a local MySQL and all reponse time <1s.

So the real problem turned out to be that I was connecting to remote MySQL (public network) before. I didn't expect network IO contributes so much overhead to the whole processing.

FYI, benchmark result:

  • bulk_update_mappings: ~80ms
  • Method 2: ~150ms and passing a datetime string doesn't add much overhead
  • session.merge: 500ms~, the slowest, as expected.
  • Related