I have a Python module to handle my SQLite database, and this module provides two functions:
init_database()
creates the database file with all theCREATE TABLE
statements, it is executed if the database does not exist yet.upgrade_database()
updates the schema of the database (ALTER TABLE
and other changes), it is executed when migrating from an older version of my program.
To ensure this critical part of the application works as expected, I wrote some tests for PyTest, to check that once these functions are executed, we get exactly some content.
My test to check the init part looks like this:
def test_init_database():
# PATH_DIRNAME is a constant with the path where the files are stored
db_path = f"{PATH_DIRNAME}/files/database.db"
if path.exists(db_path):
# Delete the file if it exists (happens if the test has already been run)
remove(db_path)
get_db(db_path).init_database()
with open(f"{PATH_DIRNAME}/files/database/database.sha256", "r") as file:
# The file may contain a blank line at the end, we just take the first one
# This is the hash sum we should get if the file is correctly generated
expected_sum = file.read().split("\n")[0]
with open(db_path, "rb") as file:
# Compute the SHA-256 hash sum of the generated database file, and compare it
assert hashlib.sha256(file.read()).hexdigest() == expected_sum
If I run this test locally, it passes with no problem. But if I run it on GitHub Action, it fails on the assertion, because the hash is different. Then I've configured my GH Action workflow to upload the generated files in an artifact, so I could check them by myself, and it looks like there is a subtle difference between the file generated on my local environment and the one generated in my workflow:
$ xxd gha_workflow/database.db > gha_workflow.hex
$ xxd local/database.db > local.hex
$ diff local.hex gha_workflow.hex
7c7
< 00000060: 002e 5f1d 0d0f f800 0a0b ac00 0f6a 0fc7 .._..........j..
---
> 00000060: 002e 5f1c 0d0f f800 0a0b ac00 0f6a 0fc7 .._..........j..
Note as the fourth byte is different (1d
vs 1c
).
What could cause this difference? Am I doing my test wrong?
CodePudding user response:
Depending on the sqlite version or build options your resulting database may vary in its format. For example the version and the page size can change and that may alter the physical format of your database. Depending on what you're trying to achieve, you may want to try to compare with a logical representation of your schema and content.
You can check these documentation pages for more information on the file format and the build options:
- https://www.sqlite.org/compile.html
- https://www.sqlite.org/formatchng.html
- https://www.sqlite.org/fileformat2.html
CodePudding user response:
Based on the advice given in comments, I think I have found a better way to test my database schema without depending on my file's metadata.
Instead of computing the SHA-256 sum, I get the table schema and compare it to the fields I expect to see if the database is well formed.
To get the schema, SQLite provides the following syntax:
PRAGMA table_info('my_table');
I can run it with Python like any regular SQL query, and it will return a list of tuples that contain, for each field:
- the position
- the field name
- the field type
- 0 if it is nullable, 1 otherwise
- the default value (or
None
if there isn't) - 1 if it a primary key
Thanks for helping me to understand my mistake!