What does "replaced" mean in git log?-CodePudding

When I do git log --all, I found one interesting commit in the log:

commit 3a1a6bfbd936ea441ecf1f071e82f89c7e8bbf6c (replaced, origin/main)

What does the replaced keyword mean in the parentheses? And how to trigger it?

CodePudding user response：

This means someone used git replace.

What git replace does is allow you to tell future Git operations that, instead of some original object, they ought to look instead at some replacement object. This paragraph covers how replacement works but does not tell you what this all means. The problem is that at this level, the meaning doesn't exist yet. It's like saying neutron capture causes the U-235 nucleus to fission into two lighter-weight nuclei, emitting two neutrons. True, but so what? Well, so, nuclear reactor or atomic bomb. We've gone from dry nuclear physics to serious consequences.

Git replacement are not quite so dramatic, fortunately. But a simple replacement can have huge consequences. The consequences it will have, in your repository, are not something we can determine in advance. All we can do is describe the idea behind replacements.

The idea behind replacements

Any Git object, once made, is read-only, and continues to exist in the repository as long as someone / something is using it. The reason for this read-only quality is that each object is found (or addressed, to use a fancy term) by its hash ID, in a key-value database whose keys are hash IDs and whose values are the hashed object. When Git extracts the object from the database, Git re-computes the hash, and verifies that the retrieved object's hash matches the key used to retrieve the object. This guarantees that the object data are not corrupt.¹

If we make a mistake when we make a new commit, that nobody else is using right now, and detect our own mistake quickly, we can correct our mistake by quickly replacing our original commit with a new commit. Our original commit is found only by the hash ID stored in some branch name. If we make a new replacement commit for it, with the mistake corrected, the new commit will have some other, different hash ID. We store the new replacement commit's hash ID in the branch name (which is writable) and we're done: the "bad" commit is still there, but is unused. With no one using it, Git will eventually drop it entirely.²

That's fine for a new commit, whose hash ID is stored only in a single branch name. But what if the commit isn't so new? In particular, commit hash IDs get stored in later commits. If this "bad" commit is part of a commit chain, we have a problem.

Remember that commits form backwards-looking chains, found by a branch name that points to what Git calls the tip commit: the last commit in the chain. That is, given some series of commits, each with its own hash ID, we might draw them by using single uppercase letters to stand in for the hash IDs:

... <-F <-G <-H   <--main

The name main points to the tip commit, whose hash is H. That commit points backwards to earlier commit G. Commit G points backwards to earlier commit F, and so on.

If there's a fault in commit F, we could try to do what git commit --amend does: make a new and improved F' and shove F up out of the way:

     F ...
    /
... <-F'

But when we do that, existing commit G—which literally contains the hash ID of existing commit F and cannot be changed—still points to F:

     F <-G <-H   <--main
    /
... <-F'

Our simple attempt to amend F doesn't work, because main points, not to F, but to H. H points to G, and will do so forever. G points to F, and will do so forever. We can copy G and H to new-and-improved G' and H':

     F <-G <-H   <--main
    /
... <-F' <-G' <-H'

and having made three copies, we can now re-point the branch name main:

     F <-G <-H
    /
... <-F' <-G' <-H'   <--main

This is what git rebase does. But it has the drawback that every commit after F must also be copied. If there are complicated chains:

             I--J   <-- br1
            /
...--F--G--H   <-- main
            \
             K--L   <-- br2

the whole thing rapidly becomes a nightmare of history rewriting, with the need to move multiple branch names. You can do this using git filter-branch or git filter-repo, but it's painful and not something you want to do frequently. This is where git replace comes in.

¹If the key used to retrieve the object, compared to the hash of the object, does not match, something happened to the data since they were originally written. The hash function is of no help in correcting the erroneous data, so at this point we're stuck with finding a good copy, presumably in another clone or a backup. That's why disk drives use, e.g., Reed-Solomon codes rather than cryptographic checksums. Git's job here is only to find corruption, not to fix it.

²This "eventually" is a maintenance operation. The newfangled git maintenance command can be used to tune this stuff—that's the future direction for Git—but the actual dropping is done via git gc or git gc --auto, in existing Git usage. That works as follows:

git gc runs git reflog expire.
git reflog scans reflogs, which contain reflog entries.
The reflog entries each have a date-and-time stamp, and a status ("reachable" or "unreachable") implied by the current hash ID stored in the corresponding ref.
The status leads git reflog expire to one of two "expiry" values: reachable, for commits reachable from the current ref value, and unreachable, for commits not reachable this way.
If the age of the entry exceeds the expiry value—30 days for "unreachable", by default—the reflog entry is deleted.

This drops the last actual reference to the internal Git commit object, which can now be deleted via git prune, which git gc runs after git reflog expire. So, running git commit --amend right after git commit pushes the "amended" commit off to the side, where it lingers for a minimum of 30 days thanks to reflog entries: one in the HEAD reflog and one in the branch reflog. Once the reflog entries are gone, there really is no reference to the commit, and git prune will prune it.

Replacements

The mechanism Git uses for replacements is simple. There's a relatively low level routine in Git to obtain an object from the objects database—that key-value store I mentioned earlier, where the keys are hash IDs and the values are objects. You give the key to the database lookup code and it fishes out the value.

Now, if you allow replacements—there are control knobs for this, at this level—then when you call the "get me an object, I have its hash ID" function, the lookup function will check to see if the object's hash ID exists as a name in the refs/replace/ namespace.

So: we can make a replacement commit F' that is a new and improved version of F. This commit has a hash ID, once we've written it to the object database. Let's say F had hash ID aaaaaaa, and F' has hash ID bbbbbbb (I've shortened them from 40 characters to 7 to make them easier to deal with, and real hash IDs are of course random looking).

We now store the hash ID bbbbbbb under the name refs/replace/aaaaaaa. That is, the hash ID of commit F, whatever it is, becomes a refs/replace/ name. In that name we store the hash ID of the replacement commit, here bbbbbbb.

When some other piece of Git software calls the "look up object" function with hash ID aaaaaaa, that software notices that refs/replace/aaaaaaa exists. That software reads the hash ID stored in refs/replace/aaaaaaa and, instead of looking up (and error-checking) aaaaaaa, it looks up (and error-checks) bbbbbbb instead. It then returns the replacement object's content, instead of the original object's content.

This means that when git log or git checkout or any other Git command goes to use commit F, it gets commit F' instead. Hence we've successfully replaced commit F without actually changing commit F.³ The git log command in particular makes sure to notice that this happened (the lookup routine will set a flag for git log to see) and adds the replaced notation that you saw.

³Note that this makes git gc and git prune have to work harder, because object F is still referenced "for real", while F' is referenced via the refs/replace/ name. Fortunately it suffices for git gc to run with replacements disabled.

Seeing reality, and why that matters

If you want to see what's really in the database, without replacements, you can run git --no-replace-objects log. This will make git log call the "get an object" function with replacements disabled. You'll see the original history, not the replaced one.

To view the replacement objects, use git replace --list (or git replace with no arguments, which means --list), or in software, git for-each-ref refs/replace.

Note that when you clone a repository, the cloning process normally does not copy the refs/replace/ namespace. Using git push also does not copy refs/replace/ names by default. So when you use git replace to construct illusory history in your repository, this only affects your repository.

You can replace non-commit objects too. Because replacement is such a low level operation, you can use it for various interesting effects. It's always local though, unless you take special action to get refs/replace/ references into another repository too.

Note that using git filter-branch and git filter-repo will make the new repository with replacements honored (though git --no-replace-objects filter-branch won't, and presumably there are similar things with filter-repo). So one use for git replace is to edit history until you have it looking the way you want others to see it. You then run an otherwise no-op filter operation, which "cements the new history in place" as it were, without requiring replacements (they're now embedded, and the originals are just gone). You then publish this new, different repository instead of the original.