How to make replication faster for mysql table with 8 million records-CodePudding

I take my Live server backup using mysqldump command via CRON job in my Ubuntu Server via Bash Shell Script and the same script uploads the backup to my backup server also. Earlier this was working fine but now I am facing slowness issue (it takes 1 hour to backup and upload on backup server) as one of the database Table size has grown to 5GB and consists of 10 Million Records. I saw on a thread that we can fasten the SQL insertion via bulk/group execution of SQL - How can mysql insert millions records faster?

But in my case I am unsure how can I create a Shell Script to perform the same.

The requirement is I want to export all my SQL Database tables in groups of maximum 10k so that execution can be faster while it is getting imported on the server.

I have written this code on my server bash script:

#!/bin/bash
cd /tmp
file=$(date  %F-%T).sql
mysqldump \
  --host ${MYSQL_HOST} \
  --port ${MYSQL_PORT} \
  -u ${MYSQL_USER} \
  --password="${MYSQL_PASS}" \
  ${MYSQL_DB} > ${file}
if [ "${?}" -eq 0 ]; then
  mysql -umyuser -pmypassword -h 198.168.1.3  -e "show databases"
  mysql -umyuser -pmypassword -h 198.168.1.3 -D backup_db -e "drop database backup_db"
  mysql -umyuser -pmypassword -h 198.168.1.3  -e "create database backup_db" 
  mysql -umyuser -pmypassword -h 198.168.1.3 backup_db < ${file}
  gzip ${file}
  aws s3 cp ${file}.gz s3://${S3_BUCKET}/live_db/
  rm ${file}.gz
else
  echo "Error backing up mysql"
  exit 255
fi

The backup server and live server share the same AWS hardware configuration: 16GB RAM, 4 CPU, 100GB SSD.

These are the screenshots and data:

Screenshot and Queries for Debugging on the Live Server: Information Schema Tables:

https://i.imgur.com/RnjQwbP.png

SHOW GLOBAL STATUS:

https://pastebin.com/raw/MuJYwnsm

SHOW GLOBAL VARIABLES:

https://pastebin.com/raw/wdvn97XP

Screenshot and Queries for Debugging on the Backup Server:

https://i.imgur.com/rB7qcYU.png
https://pastebin.com/raw/K7vHXqWi
https://pastebin.com/raw/PR2gWpqe

Server workload is almost negligible. There is no load all the times, I have also monitored via AWS Monitoring Panel, and that's the only reason to take more than required resource server so that it never gets exhausted. I have taken 16GB RAM and 4 CPU which are more than more than sufficient. AWS Monitoring Panel showed Max usage 6% rarely and maximum times it is around 1%.

CodePudding user response：

Analysis of GLOBAL STATUS and VARIABLES:

Observations:

Version: 10.3.32-MariaDB-0ubuntu0.20.04.1-log
16 GB of RAM
Uptime = 3d 05:41:50
42.1 QPS

The More Important Issues:

Some setting suggestions for better memory utilization:

key_buffer_size = 20M
innodb_buffer_pool_size = 8G
table_open_cache = 300
innodb_open_files = 1000
query_cache_type = OFF
query_cache_size = 0

Some setting suggestions for other reasons:

eq_range_index_dive_limit = 20
log_queries_not_using_indexes = OFF

Recommend using the slowlog (with long_query_time = 1) to locate the naughty queries. Then we can discuss how to improve them. http://mysql.rjweb.org/doc.php/mysql_analysis#slow_queries_and_slowlog

Details and other observations:

( (key_buffer_size - 1.2 * Key_blocks_used * 1024) ) = ((512M - 1.2 * 8 * 1024)) / 16384M = 3.1% -- Percent of RAM wasted in key_buffer. -- Decrease key_buffer_size (now 536870912).

( Key_blocks_used * 1024 / key_buffer_size ) = 8 * 1024 / 512M = 0.00% -- Percent of key_buffer used. High-water-mark. -- Lower key_buffer_size (now 536870912) to avoid unnecessary memory usage.

( (key_buffer_size / 0.20 innodb_buffer_pool_size / 0.70) ) = ((512M / 0.20 128M / 0.70)) / 16384M = 16.7% -- Most of available ram should be made available for caching. -- http://mysql.rjweb.org/doc.php/memory

( table_open_cache ) = 16,293 -- Number of table descriptors to cache -- Several hundred is usually good.

( innodb_buffer_pool_size ) = 128M -- InnoDB Data Index cache -- 128M (an old default) is woefully small.

( innodb_buffer_pool_size ) = 128 / 16384M = 0.78% -- % of RAM used for InnoDB buffer_pool -- Set to about 70% of available RAM. (To low is less efficient; too high risks swapping.)

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( innodb_io_capacity ) = 200 -- When flushing, use this many IOPs. -- Reads could be slugghish or spiky.

( innodb_io_capacity_max / innodb_io_capacity ) = 2,000 / 200 = 10 -- Capacity: max/plain -- Recommend 2. Max should be about equal to the IOPs your I/O subsystem can handle. (If the drive type is unknown 2000/200 may be a reasonable pair.)

( Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests ) = 3,565,561,783 / 162903322931 = 2.2% -- Read requests that had to hit disk -- Increase innodb_buffer_pool_size (now 134217728) if you have enough RAM.

( Innodb_pages_read / Innodb_buffer_pool_read_requests ) = 3,602,479,499 / 162903322931 = 2.2% -- Read requests that had to hit disk -- Increase innodb_buffer_pool_size (now 134217728) if you have enough RAM.

( Innodb_buffer_pool_reads ) = 3,565,561,783 / 279710 = 12747 /sec -- Cache misses in the buffer_pool. -- Increase innodb_buffer_pool_size (now 134217728)? (~100 is limit for HDD, ~1000 is limit for SSDs.)

( (Innodb_buffer_pool_reads Innodb_buffer_pool_pages_flushed) ) = ((3565561783 1105583) ) / 279710 = 12751 /sec -- InnoDB I/O -- Increase innodb_buffer_pool_size (now 134217728)?

( Innodb_buffer_pool_read_ahead_evicted ) = 5,386,209 / 279710 = 19 /sec

( Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group / innodb_log_file_size ) = 1,564,913,152 / (279710 / 3600) / 2 / 48M = 0.2 -- Ratio -- (see minutes)

( innodb_flush_method ) = innodb_flush_method = fsync -- How InnoDB should ask the OS to write blocks. Suggest O_DIRECT or O_ALL_DIRECT (Percona) to avoid double buffering. (At least for Unix.) See chrischandler for caveat about O_ALL_DIRECT

( default_tmp_storage_engine ) = default_tmp_storage_engine =

( innodb_flush_neighbors ) = 1 -- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.

( ( Innodb_pages_read Innodb_pages_written ) / Uptime / innodb_io_capacity ) = ( 3602479499 1115984 ) / 279710 / 200 = 6441.7% -- If > 100%, need more io_capacity. -- Increase innodb_io_capacity (now 200) if the drives can handle it.

( innodb_io_capacity ) = 200 -- I/O ops per second capable on disk . 100 for slow drives; 200 for spinning drives; 1000-2000 for SSDs; multiply by RAID factor.

( innodb_adaptive_hash_index ) = innodb_adaptive_hash_index = ON -- Whether to use the adapative hash (AHI). -- ON for mostly readonly; OFF for DDL-heavy

( innodb_adaptive_hash_index ) = innodb_adaptive_hash_index = ON -- Usually should be ON. -- There are cases where OFF is better. See also innodb_adaptive_hash_index_parts (now 8) (after 5.7.9) and innodb_adaptive_hash_index_partitions (MariaDB and Percona). ON has been implicated in rare crashes (bug 73890). 10.5.0 decided to default OFF.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( innodb_ft_result_cache_limit ) = 2,000,000,000 / 16384M = 11.6% -- Byte limit on FULLTEXT resultset. (Possibly not preallocated, but grows?) -- Lower the setting.

( local_infile ) = local_infile = ON -- local_infile (now ON) = ON is a potential security issue

( Qcache_lowmem_prunes ) = 6,329,393 / 279710 = 23 /sec -- Running out of room in QC -- increase query_cache_size (now 16777216)

( Qcache_lowmem_prunes/Qcache_inserts ) = 6,329,393/7792821 = 81.2% -- Removal Ratio (frequency of needing to prune due to not enough memory)

( Qcache_hits / Qcache_inserts ) = 1,619,341 / 7792821 = 0.208 -- Hit to insert ratio -- high is good -- Consider turning off the query cache.

( Qcache_hits / (Qcache_hits Com_select) ) = 1,619,341 / (1619341 9691638) = 14.3% -- Hit ratio -- SELECTs that used QC -- Consider turning off the query cache.

( Qcache_hits / (Qcache_hits Qcache_inserts Qcache_not_cached) ) = 1,619,341 / (1619341 7792821 278272) = 16.7% -- Query cache hit rate -- Probably best to turn off the QC.

( (query_cache_size - Qcache_free_memory) / Qcache_queries_in_cache / query_alloc_block_size ) = (16M - 1272984) / 3058 / 16384 = 0.309 -- query_alloc_block_size vs formula -- Adjust query_alloc_block_size (now 16384)

( Created_tmp_disk_tables ) = 1,667,989 / 279710 = 6 /sec -- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.

( Created_tmp_disk_tables / Questions ) = 1,667,989 / 11788712 = 14.1% -- Pct of queries that needed on-disk tmp table. -- Better indexes / No blobs / etc.

( Created_tmp_disk_tables / Created_tmp_tables ) = 1,667,989 / 4165525 = 40.0% -- Percent of temp tables that spilled to disk -- Maybe increase tmp_table_size (now 16777216) and max_heap_table_size (now 16777216); improve indexes; avoid blobs, etc.

( ( Com_stmt_prepare - Com_stmt_close ) / ( Com_stmt_prepare Com_stmt_close ) ) = ( 473 - 0 ) / ( 473 0 ) = 100.0% -- Are you closing your prepared statements? -- Add Closes.

( Com_stmt_close / Com_stmt_prepare ) = 0 / 473 = 0 -- Prepared statements should be Closed. -- Check whether all Prepared statements are "Closed".

( binlog_format ) = binlog_format = MIXED -- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)

( long_query_time ) = 5 -- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2

( Subquery_cache_hit / ( Subquery_cache_hit Subquery_cache_miss ) ) = 0 / ( 0 1800 ) = 0 -- Subquery cache hit rate

( log_queries_not_using_indexes ) = log_queries_not_using_indexes = ON -- Whether to include such in slowlog. -- This clutters the slowlog; turn it off so you can see the real slow queries. And decrease long_query_time (now 5) to catch most interesting queries.

( back_log ) = 80 -- (Autosized as of 5.6.6; based on max_connections) -- Raising to min(150, max_connections (now 151)) may help when doing lots of connections.

Abnormally small:

Delete_scan = 0.039 /HR
Handler_read_rnd_next / Handler_read_rnd = 2.06
Handler_write = 0.059 /sec
Innodb_buffer_pool_read_requests / (Innodb_buffer_pool_read_requests   Innodb_buffer_pool_reads ) = 97.9%
Table_locks_immediate = 2.6 /HR
eq_range_index_dive_limit = 0

Abnormally large:

( Innodb_pages_read   Innodb_pages_written ) / Uptime = 12,883
Com_release_savepoint = 5.5 /HR
Com_savepoint = 5.5 /HR
Handler_icp_attempts = 110666 /sec
Handler_icp_match = 110663 /sec
Handler_read_key = 62677 /sec
Handler_savepoint = 5.5 /HR
Handler_tmp_update = 1026 /sec
Handler_tmp_write = 40335 /sec
Innodb_buffer_pool_read_ahead = 131 /sec
Innodb_buffer_pool_reads * innodb_page_size / innodb_buffer_pool_size = 43524924.1%
Innodb_data_read = 211015030 /sec
Innodb_data_reads = 12879 /sec
Innodb_pages_read = 12879 /sec
Innodb_pages_read   Innodb_pages_written = 12883 /sec
Select_full_range_join = 1.1 /sec
Select_full_range_join / Com_select = 3.0%
Tc_log_page_size = 4,096
innodb_open_files = 16,293
log_slow_rate_limit = 1,000
query_cache_limit = 3.36e 7
table_open_cache / max_connections = 107

Abnormal strings:

Innodb_have_snappy = ON
Slave_heartbeat_period = 0
Slave_received_heartbeats = 0
aria_recover_options = BACKUP,QUICK
innodb_fast_shutdown = 1
log_output = FILE,TABLE
log_slow_admin_statements = ON
myisam_stats_method = NULLS_UNEQUAL
old_alter_table = DEFAULT
sql_slave_skip_counter = 0
time_zone =  05:30

CodePudding user response：

Rate Per Second = RPS

Suggestions to consider for your 'Backup' AWS instance Parameters Group

innodb_buffer_pool_size=10G  # from 128M to reduce innodb_data_reads RPS of 16
innodb_change_buffer_max_size=50  # from 25 percent to speed up INSERT completion
innodb_buffer_pool_instances=3  # from 1 to reduce mutex contention
innodb_write_io_threads=16  # from 4 for your intense data INSERT operations
innodb_buffer_pool_dump_pct=90  # from 25 percent to reduce WARM UP delays
innodb_fast_shutdown=0  #  from 1 to help avoid RECOVERY on instance START

You should find these changes reduce your refresh DATA time required on your BACKUP instance. Your LIVE instance has different operating characteristics but should have all these suggestions applied, as well as others. Please view profile for contact info and get in touch for additional assistance.

Your mention of REPLICATION should likely be ignored since you can not have SERVER_ID of 1 on both the MASTER and SLAVE with replication to survive. Your LIVE server can not be MASTER because LOG_BIN is OFF for the first reason.

LIVE observations, com_begin count was 30, com_commit was 0 after 3 days of uptime. Usually we find commit to be the same as com_begin. Did someone forget to COMMIT the data?

com_savepoint reported 430 operations. com_rollback_to_savepoint reported 430 operations. We do not normally see a rollback for every savepoint in 3 dqys.

com_stmt_prepare reported 473 operations. com_stmt_execute reported 1211 operations. com_stmt_close reported 0 operations. Forgetting to CLOSE prepared statements when done leaves resources in use that could have been released.

handler_rollback counter 961 in 3 days. Seems unusual for 3 days of uptime.

slow_queries counted 87338 in 3 days exceeding 5 seconds to completion. log_slow_verbosity=query_plan,explain would help your team identify the cause of slow. Your slow query logs is already ON.

Kerry, The VERY BEST to you and your team.