My Mariadb installation works well with normal times. However during peak usage, all CPU cores are being utilized 100% constantly. There are around 250000 active users during peak usage times. The cores have been scaled vertically from 8 to 16 to 32 for now. I am still facing issues of high utilization of cores during peak times. PFB the htop result:
All CPU Cores: 100%
Mem 5.25G/126G
Swp 0K/0K
Tasks: 43, 961 thr; 32 running
Load average: 150 148 148
The values of few important commands are below:
cat /proc/sys/kernel/threads-max
1030492
cat /proc/sys/kernel/pid_max
4194304
cat /proc/sys/vm/max_map_count
65530
innodb_buffer_pool_size=55G
max_connections = 50000
MariaDB version: 10.6
Slow Query main issue:
# User@Host: lrp_dbuser[lrp_dbuser] @ lrp-application-server-2.internal.cloudapp.net [10.10.0.5]
# Thread_id: 3630886 Schema: lrp QC_hit: No
# Query_time: 50.155999 Lock_time: 0.049785 Rows_sent: 0 Rows_examined: 150617
# Rows_affected: 1 Bytes_sent: 52
SET timestamp=1666331690;
UPDATE `session_lrp` SET `timestamp` = 1666331640
WHERE `id` = 'pghjejnvf8ou61usog5983h6lihh4jr0';
# Time: 221021 5:54:51
# User@Host: lrp_dbuser[lrp_dbuser] @ lrp-application-server-2.internal.cloudapp.net [10.10.0.5]
# Thread_id: 3632952 Schema: lrp QC_hit: No
# Query_time: 50.334102 Lock_time: 0.050021 Rows_sent: 0 Rows_examined: 150632
# Rows_affected: 1 Bytes_sent: 52
SET timestamp=1666331691;
UPDATE `session_lrp` SET `timestamp` = 1666331640
WHERE `id` = 'hsun0ggh48i8t4pbv9u6ac2f13fdkujf';
# User@Host: lrp_dbuser[lrp_dbuser] @ lrp-application-server-2.internal.cloudapp.net [10.10.0.5]
# Thread_id: 3632570 Schema: lrp QC_hit: No
# Query_time: 59.245218 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 0
# Rows_affected: 0 Bytes_sent: 70
SET timestamp=1666331691;
SELECT GET_LOCK('62f67a85ee74fca9c3b3c30a20f70f82', 300) AS ci_session_lock;
# User@Host: lrp_dbuser[lrp_dbuser] @ lrp-application-server-2.internal.cloudapp.net [10.10.0.5]
# Thread_id: 3634754 Schema: lrp QC_hit: No
# Query_time: 15.116038 Lock_time: 0.000000 Rows_sent: 1 Rows_examined: 0
# Rows_affected: 0 Bytes_sent: 70
SET timestamp=1666331691;
SELECT GET_LOCK('6a046e7d67351ce96e8ed43e2c27eaed', 300) AS ci_session_lock;
# User@Host: lrp_dbuser[lrp_dbuser] @ lrp-application-server-1.internal.cloudapp.net [10.10.0.4]
# Thread_id: 3632996 Schema: lrp QC_hit: No
# Query_time: 50.000255 Lock_time: 0.050056 Rows_sent: 0 Rows_examined: 0
# Rows_affected: 0 Bytes_sent: 67
SET timestamp=1666331691;
UPDATE `session_lrp` SET `timestamp` = 1666331641
WHERE `id` = '67aofej14f1eq8aqobq0hecuh7qavp1d';
Please let me know what do i need to optimize.
CodePudding user response:
Lack of INDEX?
# Query_time: 50.155999 Lock_time: 0.049785 Rows_examined: 150617
# Rows_affected: 1 Bytes_sent: 52
SET timestamp=1666331690;
UPDATE `session_lrp` SET `timestamp` = 1666331640
WHERE `id` = 'pghjejnvf8ou61usog5983h6lihh4jr0';
Is there an index on id
? Please provide SHOW CREATE TABLE session_lrp
Usage of GET_LOCK?
# Query_time: 59.245218
# Rows_affected: 0 Bytes_sent: 70
SET timestamp=1666331691;
SELECT GET_LOCK('62f67a85ee74fca9c3b3c30a20f70f82', 300)
AS ci_session_lock;
Please explain the usage of GET_LOCK
-- It may be the cause of your problem. And there may be a much better workaround.
Timeout -> ROLLBACK
# Query_time: 50.000255 Lock_time: 0.050056
Do you check for errors? The query was not performed! Being that close to 50 seconds smells like you hit innodb_lock_wait_timeout
.
Do not increase that setting. Instead:
- fix the cause
- decrease the timeout
- change the code to detail with this and every other query that gets an error.