Home > Blockchain >  MySQL is utilizing 500% of the CPU and causing the website to slow down
MySQL is utilizing 500% of the CPU and causing the website to slow down

Time:02-06

I have a virtual machine with 64 vCPUs and 256GB of memory. Recently, I decided to perform some stress tests on the website that is running on this virtual machine. The entire VM is only for this website.

The first test I ran was with 20,000 users per second, and the average response time was around 1400ms. During the test, the site was not usable.

enter image description here

After that, I decided to inspect the top processes to identify the source of the problem. These are the processes and their CPU utilization during the test:

top - 10:30:19 up 1 day, 34 min,  0 users,  load average: 8.39, 3.04, 1.46
Tasks: 711 total,   2 running, 709 sleeping,   0 stopped,   0 zombie
%Cpu(s):  6.0 us,  9.8 sy,  3.8 ni, 79.2 id,  0.2 wa,  0.0 hi,  0.9 si,  0.0 st
MiB Mem : 257925.6 total, 219425.1 free,   3658.2 used,  34842.3 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used. 252346.8 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME  COMMAND                                        
218159 mysql     20   0 6911232  96204  19792 S 491.4   0.0   4:24.99 mysqld                                         
139405 nobody    20   0   54948  34196   6128 D  44.9   0.0   0:52.17 litespeed                                      
218251 obl74   21   1  347708  29228  19328 S  40.9   0.0   0:20.83 lsphp                                          
218402 obl74   21   1  347708  29152  19264 S  40.9   0.0   0:22.35 lsphp                                          
218955 obl74   21   1  273004  21336  12472 D  40.9   0.0   0:22.39 lsphp                                          
218957 obl74   21   1  273004  21336  12472 D  40.9   0.0   0:22.22 lsphp                                          
218961 obl74   21   1  273004  21336  12472 S  40.9   0.0   0:22.37 lsphp                                          
218963 obl74   21   1  273004  21328  12468 S  40.9   0.0   0:22.31 lsphp                                          
218252 obl74   21   1  347708  29228  19328 D  40.5   0.0   0:22.42 lsphp                                          
218407 obl74   21   1  347708  29152  19264 D  40.5   0.0   0:22.30 lsphp                                          
218956 obl74   21   1  273004  21332  12472 S  40.5   0.0   0:20.73 lsphp                                          
218959 obl74   21   1  273004  21336  12472 S  40.5   0.0   0:22.13 lsphp 

Interestingly, despite the website's poor performance during the test, neither the CPU nor memory usage was particularly high. Also, during the test, CyberPanel indicated a CPU usage of 19% and a memory usage of 2%. Therefore, I conclude that the server is not experiencing any resource constraints, as it is not utilizing all its CPU and memory. However, it is still lagging for some reason.

Then, I decided to remove the components related to MySQL from the page on which I performed the stress test. The outcome was much more stable.

enter image description here

top - 10:43:54 up 1 day, 47 min,  0 users,  load average: 0.87, 1.23, 1.41
Tasks: 705 total,   5 running, 699 sleeping,   0 stopped,   1 zombie
%Cpu(s):  2.8 us,  1.0 sy,  0.4 ni, 95.2 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st
MiB Mem : 257925.6 total, 218249.7 free,   3910.0 used,  35765.9 buff/cache
MiB Swap:   2048.0 total,   2048.0 free,      0.0 used. 252098.9 avail Mem 

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME  COMMAND                                        
139416 nobody    20   0   53200  32480   6128 S  18.3   0.0   0:47.00 litespeed                                      
139402 nobody    20   0   52928  33308   7204 S  16.6   0.0   0:44.40 litespeed                                      
139409 nobody    20   0   54900  34136   6188 S  16.6   0.0   0:46.38 litespeed                                      
139410 nobody    20   0   49904  29156   6128 S  16.6   0.0   0:35.43 litespeed                                      
139414 nobody    20   0   51688  30936   6128 R  16.6   0.0   0:45.46 litespeed                                      
139415 nobody    20   0   55492  35280   6680 R  15.9   0.0   0:46.24 litespeed                                      
139412 nobody    20   0   52112  31420   6188 S  15.6   0.0   0:45.05 litespeed                                      
139404 nobody    20   0   50396  29644   6128 S  15.3   0.0   0:44.83 litespeed                                      
139413 nobody    20   0   44700  23816   6128 S  15.3   0.0   0:21.83 litespeed                                      
139406 nobody    20   0   50752  30004   6128 S  15.0   0.0   1:05.25 litespeed 

According to CyberPanel, during the new test, the CPU usage was 4% and the memory usage was 2%.

Therefore, it is obvious that there is an issue with MySQL. I am currently using the default my.cnf configuration provided by CyberPanel, but I have attempted various other configurations found on the internet, yet nothing has improved the performance even a little bit. I've also tried stuff like MySQL Tuner but it didn't change the performance.

The MySQL part that I removed for the second test was a basic query of a table that contained 7 rows. It verified the user's IP address to determine if they were on the IP whitelist. This operation should not have posed a significant problem.

As observed in both tests, I detect a threshold or a bottleneck at the start, beyond which the site experiences a sharp increase in lag. Despite having ample free memory and CPU, there seems to be some limiting factor.

Some might argue that a rate of 20,000 users per second is excessive and unrealistic. However, even when I conducted the test with only 250 users per second, the outcome was the same: the website was extremely slow and not usable.

enter image description here

At this point I am totally lost. I am uncertain as to where to focus my efforts and what steps to take next to decrease the average response time. I would greatly appreciate any insightful comments or suggestions you may have and I thank you in advance for your time and consideration.

I have attempted various my.cnf configurations, as well as suggestions from MySQL Tuner. I have also investigated PHP and Litespeed tuning options.

CodePudding user response:

For 20K users per second, you need multiple servers and switches in front of them. Period. End of discussion.

Well, OK, I'll discuss it further.

When MySQL is presented with lots of 'simultaneous' users, it plays fair with them -- each is given equal access to all resources. This is fine until it falls off the cliff. This is when most of the processing is dealing with sharing of the resources. All the threads will eventually finish, but each will take a long time and you (the DBA) will think it crashed and pull the plug.

A simple cure is to lower (YES, lower) the value of max_connections. It turns out that the "cliff" is at a few dozen connections.

Is benchmarking, one throws as much stuff at the server until it croaks. That's usually a few dozen.

In real life web pages are not doing 100% database operations, they are letting the user react, building pages, etc. So, a max_connections of a few hundred is realistic.

Once it reaches the cliff, latency goes through the roof. You would expect throughput to increase, too, but it decreases slightly. I believe that this is because the threads are stumbling over each other too much. Think about any "cache" (buffer_pool, open_tables, table_definitions, etc) -- if "too many" threads are running, the caches may become ineffective.

Think about a market with so many shoppers that they spend most of their time juggling around other people. More shoppers per hour can get through the market if they keep shoppers from entering when it is "full". max_connections is that limiter.

INDEXes needed

ALTER TABLE table_name ADD INDEX(zone);
ALTER TABLE table_name ADD INDEX(IPPool);

(Then take a crash course in the benefits of Indexes (aka "KEYs").)

CodePudding user response:

Suggestions to consider for your CloudSQL configuration

innodb_buffer_pool_size=8G  #  from ~ 192G because current data is less than 1G
innodb_io_capacity=500  #  from 200 to utilize more of your SSD IOPS
innodb_lru_scan_depth=100  # from 1024 to conserve 90% CPU cycles used every second for function
key_buffer_size=20M  # from ~ 128M needed for tmp tbl management, NO MyISAM tbls
sql_log_bin=0  # from ON unless you have a need for this specific log

Please view profile for contact info. Other performance enhancements available.

  • Related