Home > Enterprise >  Writing a git revision hash (SHA-1) to a PostgreSQL database with psycopg2
Writing a git revision hash (SHA-1) to a PostgreSQL database with psycopg2

Time:10-22

Short

What's the most efficient and effective way to store a git revision hash (SHA-1) such as f844cdc09651448d6c3e765fadac448253a16928 into a PostgreSQL database (> v.11) with psycopg2?

Details and code

I have an SHA-1 hash as an hexadecimal string in Python, which I would like to store in a PostgreSQL database:

import psycopg2
from subprocess import Popen, PIPE

psycopg2.__version__ # prints '2.9.1 (dt dec pq3 ext lo64)'

cmd_list = [ "git", "rev-parse", "HEAD", ]
process = Popen(cmd_list, stdout=PIPE, stderr=PIPE)
stdout, stderr = process.communicate()
git_sha1 = stdout.decode('ascii').strip()

conn = psycopg.connect(**DB_PARAMETERS)
curs = conn.cursor()
sql = """UPDATE table SET git_sha1 = %(git_sha1)s WHERE id=1;"""

curs.execute(
    sql,
    vars = {
        "git_sha1": git_sha1
    }
)

conn.commit()
conn.close()

For the moment I have a git_sha1 field in the database as VARCHAR(40), but as a git revision hash is an hexadecimal string, it can be better to restrict the characters to only be [0-9a-f]. But I don't feel comfortable to manually set a domain only for that field... I have the feeling it should exist a better and more robust way for doing that.

So, does such a better and more elegant way exist for writing this kind of data in a PostgreSQL database?

Versioning:

  • Python 3.6.9 (default, Jan 26 2021, 15:33:00)
  • git version 2.33.1
  • psql (PostgreSQL) 12.4 (Ubuntu 12.4-1.pgdg18.04 1)
  • Ubuntu 18.04 (5.4.0-87-generic x86_64 GNU/Linux)

CodePudding user response:

Git IDs are SHA-1 checksums. These are represented as 40 character hex strings, but they are really 20 byte numbers. Store them as binary data: bytea. This will cut storage in half.

decode the hex string when inserting, encode back to hex when fetching.

create temporary table foo ( git_id bytea );

insert into foo (git_id) values 
  (
    decode('f844cdc09651448d6c3e765fadac448253a16928', 'hex')
  );

select encode(git_id, 'hex') from foo;

In psycop2, or you can convert it to bytes and psycop will do the right thing.

curs.execute(
    sql,
    vars = {
        "git_sha1": bytes.fromhex(git_sha1)
    }
)

See Binary adaptation in the psycop docs.

  • Related