Efficiently seeding MySQL with large amount of data before tests-CodePudding

I want to introduce testing into a huge legacy system with a web API. So because of the lack of features in the framework(symfony 2) I decided to write Postman tests to test the system from the outside. The problem is the huge database should be in place for the tests to work and it needs to be at a certain state for each test(it can't be reused by all tests because they might alter the data). From my tests using a sql dp to restore takes around 40 seconds which is not acceptable for each test. Now I am in need of a solution or just give up on testing which I do not want to do. One solution I have come up with but need verification that it works is to:

Bring up a MySQL docker container and use a sql dump to get the database to initial state.
Copy the MySQL data volume to someplace safe called initdata.
Copy the initdata to a location used as MySQL data volume.
Run the test
Delete container and modified data volume.
Reapeat from step 2 for each test.

This is the general idea but I need to know whether this works with MySQL docker and whether copying volumes is actually efficient and fast enough. Or maybe any other sane solution for this situation.

CodePudding user response：

I helped a company that wanted to test a 1TB MySQL database repeatedly. The solution we ended up with was to use LVM filesystem snapshots, so one could quickly revert the whole filesystem to its saved state almost instantly.

But if using filesystem snapshots is not an option, you may still have to use some backup/restore tool.

Logical data loads (i.e. importing a mysqldump file) are known to be very time-consuming. There are some alternative tools like mysqlpump or mydumper, but they're all pretty slow.

Physical backup tools like Percona XtraBackup are much faster, especially on restore. But restoring a physical backup is a bit tricky because the MySQL Server must be shut down.

There's a good comparison of the performance of backup/restore tools for MySQL in this recent blog: https://www.percona.com/blog/backup-restore-performance-conclusion-mysqldump-vs-mysql-shell-utilities-vs-mydumper-vs-mysqlpump-vs-xtrabackup/

CodePudding user response：

So this is what we did, and I'm writing it here for anyone that have faced the same problem. We built a MySQL image with our data imported to it from a mysqldump file, and we bring up a container with that image, run our tests and then bring it down, remove it and do it all over for each test. Now this method is quite efficient and bringing up the container and stopping and removing it takes around 5 seconds for each test for a db with 500 tables and a 55mb dump (we removed all unnecessary rows). Here is a sample of the docker file and the process that we used to build the image:

FROM docker.supply.codes/base/mysql:8.0.26
COPY ./mysql_data /var/lib/mysql

and we have a script that is run everytime our dump gets updated in git which imports the dump, builds image and pushes it to a docker registry:

# run a mysql container
docker run -d --name $MYSQL_CONTAINER_NAME -v mysql_data:/var/lib/mysql -e MYSQL_ROOT_PASSWORD=$MYSQL_ROOT_PASSWORD -e MYSQL_DATABASE=$MYSQL_DATABASE $MYSQL_IMAGE mysqld --default-authentication-plugin=mysql_native_password
# Wait until MySQL container is completely up 
sleep 10
# import mysqldump
docker exec -i $MYSQL_CONTAINER_NAME sh -c 'exec mysql -uroot -p"$MYSQL_ROOT_PASSWORD" $MYSQL_DATABASE' < ./$MYSQL_DUMP_FILE_NAME
docker stop $MYSQL_CONTAINER_NAME
# keep the data directory in a new container
docker run -d --name $ALPINE_CONTAINER_NAME -v mysql_data:/mysql_data $ALPINE_IMAGE
# copy the directory to local
docker cp $ALPINE_CONTAINER_NAME:/mysql_data .
# build image with the data(look at the dockerfile)
docker build -t $DOCKER_IMAGE_TAG .
# push it to repo
docker push $DOCKER_IMAGE_TAG

Quite frankly I don't understand the need for copying data to an apline container and then back to the local machine, but the DevOps said it's required because this is being handled by the gitlab ci.

And this is a script that runs postman collections using newman cli in which I start and stop a db container with that image for each test:

for filename in ./collections/*.json; do
    # run test symfony test db container
    docker run --name "$dbContainerName" --network="$networkName" -d "$dbImageName" > /dev/null

    # # sleep 
    sleep 5

    # # run the collection
    newman run "$filename" -d parameters.json
    returnCode=$?

    # add test and result to log
    nameWithoutPath="${filename##*/}"
    name="${nameWithoutPath%.postman_collection.json}"
    tests =("$name")
    testResults =($returnCode)

    # stop and remove the symfony test db container
    docker stop "$dbContainerName" > /dev/null
    docker rm "$dbContainerName" > /dev/null
done