I've been developing a very simple webpage in recent days and I've decided to use SQLAlchemy and Flask as back-end stack.
Table DDL:
create table if not exists testdb.table_datetime
(
id int auto_increment
primary key,
col1 varchar(50) null,
col2 varchar(50) null,
col3 varchar(50) null,
col4 varchar(50) null,
col5 varchar(50) null,
col6 varchar(50) null,
local_modified datetime null
)
collate=utf8mb4_bin;
My ORM model:
class DemoClass(Base):
__tablename__ = 'table_datetime'
local_modified = Column(DATETIME)
# __tablename__ = 'table_varchar'
# local_modified = Column(String)
id = Column(Integer, primary_key=True)
col1 = Column(String)
col2 = Column(String)
col3 = Column(String)
col4 = Column(String)
col5 = Column(String)
col6 = Column(String)
Code:
with Session() as session:
db_rows = session.query(DemoClass).all()
for item in data:
for db_row in db_rows:
if item['id'] == db_row.id:
db_row.col1 = item['col1']
db_row.col2 = item['col2']
db_row.col3 = item['col3']
db_row.col4 = item['col4']
db_row.col5 = item['col5']
db_row.col6 = item['col6']
# db_row.local_modified = item['local_modified'] # Very slow, 6-8s for 500 rows data
db_row.local_modified = datetime.strptime(item['local_modified'], '%Y-%m-%d %H:%M:%S') # Fast, <1s for 500 rows data
break
session.commit()
I found that assigning a string '2021-10-02 11:12:34'
to an ORM DATETIME
would be so much slower, compared to datetime.strptime(2021-10-02 11:12:34', '%Y-%m-%d %H:%M:%S')
.
My test environment:
Flask==2.0.1
PyMySQL==1.0.2
MySQL==1.0.2
SQLAlchemy==1.4.25
Python 3.6.5 Windows
MySQL remote server, Server version 5.7.33-0ubuntu0.18.04.1
The result is that, for 500 rows to update, assign a datetime string takes 6 to 10 seconds, while converting datetime string to datetime
first then assign takes less than 1 second.
Complete test code and setup is available here: https://github.com/ajfg93/sqlalchemy-demo/tree/datetime_slow
My question is, is that performance degradation reasonable (less than 1 second vs 6 to 10 seconds)?
I guess that if I am passing a datetime string instead of a datetime
object, the internal code (SQLAlchemy or MySQL, I don't know who would do the work) may try to guess the string format and get a datetime
object it thinks correct? But really that much slower? Is there any other reasons?
If I didn't test the code, I could probably never get to know that it is db_row.local_modified = item['local_modified']
slow down the whole processing flow. I mean, I would prefer SQLAlchemy to raise an exception, telling me that data type mismatches, rather than resulting in this huge performance degradation.
CodePudding user response:
Using the example attached the difference it due to the fact that in the datetime case SQLAlchemy finds no change to the objects and no update is emitted to the db, as can be verified by locking at the log printed by the engine.
If you add timedelta(1)
to the date you should see similar times.
Also regarding the manipulation of the strings, SQLAlchemy does not parse them, they are passed to the dbpi as is. You can verify it by locking at the logs where you will see the arguments that are passed to the db.
Regarding the time taken, for me it takes less than 1s in all cases
CodePudding user response:
I rewrote some code and ensure that every time the post data is different, so as to ensure SQLAlchemy would do updating.
Then I tested it again with a local MySQL and all reponse time <1s.
So the real problem turned out to be that I was connecting to remote MySQL (public network) before. I didn't expect network IO contributes so much overhead to the whole processing.
FYI, benchmark result:
bulk_update_mappings
: ~80ms- Method 2: ~150ms and passing a datetime string doesn't add much overhead
session.merge
: 500ms~, the slowest, as expected.