Home > database >  In Git can one change something the author, date, etc without changing the SHA?
In Git can one change something the author, date, etc without changing the SHA?

Time:08-25

In Git if one changes the commit Author, comment, date, or parent does it always change the commit SHA?

CodePudding user response:

In Git, commits are immutable and identified by their hash. If you take an existing commit and modify it, such as by changing the commit message, data, author, committer, or parents, then that will create a new commit with a different hash.

This is because a cryptographic hash function is used and changing the input should always change the output. With ShA-1, the default hash algorithm, it is theoretically possible to create two differing commits with the same hash, but this takes about USD 45,000 worth of cloud computing resources and most versions of Git will detect this tampering and refuse to operate on the repository. With the new SHA-256, creating collisions is not believed to be currently possible.

So the answer is that practically, changing the inputs to the commit changes the hash, but with SHA-1, it is theoretically possible to create two commits with the same hash if you spend a lot of resources.

CodePudding user response:

The answer to this becomes clear once you understand what "commit SHA" means.

Firstly, "SHA" stands for "Secure Hash Algorithm". Specifically, git by default uses an algorithm called "SHA-1". A hashing algorithm is a set of mathematical steps that can take any input in a large range, and produce a smaller output, with the property that supplying the same input will always supply the same output. Because there are more possible inputs than possible outputs, there will be "collisions" where two inputs produce the same output - something known as the "pigeonhole principle" - but the "secure" in "SHA" means, among other things, that such collisions are deliberately hard to predict.

(SHA-1 is no longer state of the art in this regard: "attacks" have been found that make finding a collision easier than intended, but they are still far too expensive to be of practical use to most people.)

The second piece of the puzzle is what git means by a "commit". At one level, git is simply a database of "blobs", similar to files on a disk. The contents of each file you commit is stored as a blob, as is the "tree" listing file names and permissions. A commit is actually just another blob - a piece of text listing a date, an author, a commit message, references to parent commits, and a reference to the tree of files that exist at that point in history.

Where these come together is that every blob is passed as the input to SHA-1, and indexed in the database by the resulting hash. A "commit hash" is simply the SHA-1 hash of the blob representing that commit.

So, to come back to the question: if you change the commit message, author, etc in a commit, you change its content; new content means new input to SHA-1, which means a new, and unpredictable, hash. If you change the content of a file, the hash of the file will change; the tree referencing that file will be updated to include that hash, so its hash will change; and the commit referencing that tree will be updated to reference that, so you get a new commit hash.

(This is, by the way, the same thing that makes "blockchains" immutable. The only real difference between blockchains and git is the way strangers agree which "commit history" represents the "true" order of events.)

  •  Tags:  
  • git
  • Related