Home > Software engineering >  How to manually specify a git commit sha?
How to manually specify a git commit sha?

Time:12-11

This answer explains that normally a git commit SHA is generated based on various parameters. However, I would like to know: how can one specify a custom/particular/specific git commit sha (in Bash)?

For example, suppose one wants to create and push a commit to Git with the following sha:

1e23456ffd118db9dc04caf40a442040e5ec99f9

(For simplicity, assume one can assume it is a unique sha).

The XY-problem is a manual mirror script between two different Git servers. It would be more convenient to simply have identical commit SHA's than to keep a mapping of the commits between the Git servers. This is because the manual mirror is more efficient (saving computation time and server bandwidth) if I can skip certain commits from the source server. Yet that means the parent commits change in the target server, with respect to the same commit in the source server. In turn, that would imply the SHA changes, which would require me to keep track of a mapping of the sha's in the source and target server. In short, it would be more convenient to simply override the sha's of the commits to the target server, than to ensure the two servers have the exact same commits (for the few commits that are actually mirrored).

CodePudding user response:

A commit SHA isn't just "normally" generated based on those parameters, it is by definition a hash of those parameters. "SHA" is the name of the hashing algorithm used to generate it.

Rather than trying to change the commit hashes, you should look for an efficient way to track them. One approach would be similar to how plugins like git svn work:

  • When copying a commit to the mirror, record the original commit hash as part of the new commit's commit message.
  • Possibly, since you're "skipping" commits in the original repo, each new commit should have multiple source hashes, since it will act like a "squash" of those commits.
  • Have a script which processes the result of git log and extracts these recorded commit hashes. This can then be used instead of the real commit hashes when determining what new commits to copy from the source.

However, make sure this is all worth it: if the eventual changes are all included, the chances are that git's existing de-duplication and compression will mean the overhead of the "skipped" commits is fairly low.

CodePudding user response:

Since you've already outlined in your question that you have ways of handling your differences, I will assume this question is really and only this:

I would like to know: how can one specify a custom/particular/specific git commit sha (in Bash)?

And not "or do you have any other ideas that I could use instead".

And with that question, the answer is actually quite simple:

You can't.


Git doesn't just calculate the commit id because that's just a by-product of the implementation chosen. The way it is done is a core concept of how git is designed.

The commit id is calculated based upon the content of the commit, and this includes, as you have observed, the link to the parent. Change the parent but keep everything else identical, the commit id still changes.

This is core to how the distributed part of the version control system works, and cannot be changed.

You simply cannot change the id of a commit and keep the contents of it the same. This is by design

There has been some attempts at doing commit collisions by carefully constructing distinct commits that end up having the same id.

Here's such a successful attempt (collision): https://www.theregister.com/2017/02/23/google_first_sha1_collision/

First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time

I don't believe anyone yet have managed to take an arbitrary commit and then targeting a specific commit id with it. The collisions were carefully constructed by manipulating two commits simultaneously according to very specific criteria such that they arrived at the same id, but that id was not chosen by the researches.

TL;DR: It can't be done

The net effect of the collision(s) generated though is that Git will move away from SHA-1 at some point and go for a system that produces longer, and "more secure" (tm) hashes than what we have today. Since Git also wants to be backwards compatible with existing repositories, this work is not yet fully completed.

CodePudding user response:

From the comment by CodeCaster, it seems I could use the freely choosable bits in the commit message in `git commit -m "some message" to ensure the sha of the commit ends up with a specific value.

However, based on the comment by Lasse V. Karlsen I would assume this approach requires non-linear computation resources. I did not go into detail in this, however I imagine/assume that as the commit history grows, the relative impact of the (limited (5mb) ) freely choosable bits of the commit message becomes smaller. I guess that could be an explanation on why leveraging these freely choosable bits in the commit message becomes costly.

So in practice, the answer seems to be: "You could (perhaps, if you spend a lot of computational resources), but you shouldn't.".

CodePudding user response:

how can one specify a custom/particular/specific git commit sha (in Bash)?

One cannot. The commit hash is a value constructed, as you say, by hashing various values together, and the whole point is to uniquely identify a particular commit. You could commit the same set of files at a different time on a different machine and you'd end up with a different commit hash.

The way to ensure that you have the same commits on two different machines is to git pull (or similar) those commits from one machine to the other. You don't necessarily have to move all the commits -- you could e.g. squash them or cherry-pick only certain commits.

  • Related