Short
What's the most efficient and effective way to store a git revision hash (SHA-1) such as f844cdc09651448d6c3e765fadac448253a16928
into a PostgreSQL database (> v.11) with psycopg2
?
Details and code
I have an SHA-1 hash as an hexadecimal string in Python, which I would like to store in a PostgreSQL database:
import psycopg2
from subprocess import Popen, PIPE
psycopg2.__version__ # prints '2.9.1 (dt dec pq3 ext lo64)'
cmd_list = [ "git", "rev-parse", "HEAD", ]
process = Popen(cmd_list, stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()
git_sha1 = stdout.decode('ascii').strip()
conn = psycopg.connect(**DB_PARAMETERS)
curs = conn.cursor()
sql = """UPDATE table SET git_sha1 = %(git_sha1)s WHERE id=1;"""
curs.execute(
sql,
vars = {
"git_sha1": git_sha1
}
)
conn.commit()
conn.close()
For the moment I have a git_sha1
field in the database as VARCHAR(40)
, but as a git revision hash is an hexadecimal string, it can be better to restrict the characters to only be [0-9a-f]. But I don't feel comfortable to manually set a domain only for that field... I have the feeling it should exist a better and more robust way for doing that.
So, does such a better and more elegant way exist for writing this kind of data in a PostgreSQL database?
Versioning:
- Python 3.6.9 (default, Jan 26 2021, 15:33:00)
- git version 2.33.1
- psql (PostgreSQL) 12.4 (Ubuntu 12.4-1.pgdg18.04 1)
- Ubuntu 18.04 (5.4.0-87-generic x86_64 GNU/Linux)
CodePudding user response:
Git IDs are SHA-1 checksums. These are represented as 40 character hex strings, but they are really 20 byte numbers. Store them as binary data: bytea
. This will cut storage in half.
decode
the hex string when inserting, encode
back to hex when fetching.
create temporary table foo ( git_id bytea );
insert into foo (git_id) values
(
decode('f844cdc09651448d6c3e765fadac448253a16928', 'hex')
);
select encode(git_id, 'hex') from foo;
In psycop2, or you can convert it to bytes
and psycop will do the right thing.
curs.execute(
sql,
vars = {
"git_sha1": bytes.fromhex(git_sha1)
}
)
See Binary adaptation in the psycop docs.