Home > database >  github: Find hash (tag) of a specific commit in a download (as .zip or .tar.gz)?
github: Find hash (tag) of a specific commit in a download (as .zip or .tar.gz)?

Time:09-21

Scenario: I have two directories originating from one and the same github repository, but downloaded as .zip (or maybe .tar.gz) at different times.

Question: How can I find the commit hash inside these two directories? Is it even stored anywhere?

Background: I was hacking on some code, got sidetracked, and I have forgotten why I have two different directories. The directories are clearly different (used diff -r dir1 dir2), and the differences are not just MY little hacks. The directories have a file setup.cfg that both contain the line version = 0.3.5, so dirs are the same version/"release" but not the same commit hash. I would like to find out what the commit hashes are/were.

CodePudding user response:

If it has been downloaded as a zip or tar archive, it's not a commit and the hash ID may well be gone. I believe GitHub stick the raw hash ID into an extended header, as they use git archive to do it:

In the [case where a commit hash ID is used to build the archive] ... Additionally the commit ID is stored in a global extended pax header if the tar format is used; it can be extracted using git get-tar-commit-id. In ZIP files it is stored as a file comment.

You will need the original tar or zip file to test for this. If it's uncompressed:

git get-tar-commit-id < archive

If it's already compressed, decompress it with zcat or gunzip or whatever is appropriate on your system:

gunzip < foo.tar.gz | git get-tar-commit-id

for example.

What if you don't have the original archive, or it has no ID?

In general, there is no unique mapping from an extracted source tree back to a particular commit. In some sense this doesn't matter: if you can obtain a Git tree hash for a source tree, and can find all the commits that have that tree hash, then all of those commits are the commits that would produce that archive. But git archive potentially omits, adds, or does substitutions in file contents.

Finding the actual tree hash for some set of files is nontrivial, although I have a program that can do it here. Read through the source to learn the conditions under which it can work. Once you have that, you can search for commits that have that as their tree in their commit object, using git rev-parse:

git rev-list $start_points |
while read $chash; do
    thash=$(git rev-parse $chash^{tree})
    [ $thash = $searchfor ] && echo "tree found in commit $chash"
done

for instance (untested and you'll need to fill in a few variables).

  • Related