Home > Net >  Change git commit author without creating a new commit
Change git commit author without creating a new commit

Time:07-28

There are many posts about how to change the author / email of git commits, but based on my understanding and what I'm observing, all of them simply create a new commit with the adjusted user info, but do not actually remove the old commit.

Here's the situation. Someone committing code to our repository was new to git, and set up his global git configuration incorrectly, such that instead of that user's name and email address, it was set up with that user's email address as the name, and his email password as the email address.

Naturally, that person has since changed his password, but it's still lingering in the repo and we'd like to fix that person's commits. Unfortunately, no one knew that that was how it was set up on his development PC until he had already pushed, and now it's visible in GitLab for all those commits.

We've tried a variety of things, such as git filter-branch --env-filter, git replace, git rebase, basically everything that comes up when you google anything along the lines of "change git commit author." It seems like the branch is relatively clean at this point. None of the "tainted" commits are still in that branch, but they are still in the repo somewhere. For example, on one of the dev PCs, git log --reflog | grep "ROGUE PASSWORD" still shows several results. Even though they aren't in the branch, if someone browses git log --reflog, they can still see it in there somewhere, and you can put those commit hashes into GitLab and still find them.

It seems as though there truly might not be a way to resolve this... that is, you can shuffle commits around and fix a branch, but git doesn't seem to give you a way to truly delete a commit and all its metadata (that might not be true, and if it is, it's generally unnecessary, so maybe it's just hard to find documentation of that option?). To some degree, that's ok... like I said, he has since changed that password. But still, after beating our heads against the wall for a couple hours, we have to know: can this be fixed? If I have a list of all commit hashes associated with the wrong name/email, is there a way to really and truly remove them from the repo?

CodePudding user response:

The Git commit number (SHA-1 hash) is formed from the contents of the commit object. The commit object contains the following information:

  1. SHA-1 hash for tree object containing file versions attached to the commit
  2. SHA-1 hash(es) for parent commit(s)
  3. Author name and email
  4. Date and time
  5. Commit messages

Changing any of these will result in a new SHA-1 hash.

For more information on this see: https://git-scm.com/book/en/v2/Git-Internals-Git-Objects.

The branches and tags in a repository should point to all prior commits that still have "value" within a project either directly or through a chain of parent commits of the commit pointed to by a branch or tag.

Orphaned commits can occur for various reasons, such as rebasing, deleting old branches, etc. Orphaned branches and commits may persist for some time after they are no longer pointed to by a branch or tag directly or through a chain of parent commits, but should not be a concern, since their being orphaned indicates they no longer have "value". Also, the remote repository will periodically clean these up to reclaim the space and these commits will not be present in local repositories unless the user explicitly fetches the commit from the remote repository.

If there are true concerns about keeping the commit in the repository, such as when it may contain sensitive information, expletives, whatever, you can use git gc (https://git-scm.com/docs/git-gc), but be aware that this will delete all orphaned branches and commits in your local repository. To do this on a remote repository, for example Bitbucket or Github, you would need to refer to the platform documentation on how to force the server to do garbage collection. In general, doing garbage collection on the remote server is not recommended, since this may prevent recovery of items when needed and given that most servers will periodically run garbage collection to remove commits that have been orphaned for a long time.

CodePudding user response:

TL;DR

Be patient, it will take a few months even for seemingly orphaned commits to get garbage collected.

Or run git gc --aggressive in every sandbox and server instance you know of.

The details

This is a tricky situation, because anyone who cloned your repo or fetched from it while those commits were there will have local copies too.

Orphaned commits eventually get deleted when you run garbage collection, but it can take a long time (and maybe forever) before commits are actually orphaned: any branch, tag, PR, issue or anything else that points to those commits would keep it alive. So to truly orphan a commit, you need to delete anything that refers to it.

Even if you have nothing pointing to the commits, there's still the reflog: it has a default retention of 90 days, and no commit that's findable from an entry in the reflog in someone's sandbox will get garbage collected: it's still not orphaned if it can be access from an entry in the reflog!

You can be patient, eventually these commits will fall out of the reflog and get garbage collected. Since the user changed their password, that might be good enough.

Or you can be less patient and run git gc --aggressive in everyone's sandboxes. But that might clean things they'll later wish they still had access to. And of course, that's only going to be effective if they don't have any lingering local branches or tags pointing to the bad history.

You'll need to refer to your server's documentation to figure out how to do the equivalent command there, keeping in mind that the branch you submitted a PR/MR from might get kept on the server forever.

To truly clean on the server side, you might have to create a new repo and just push the good branches and tags to it, but that would mean losing your issues and PR history and all that.

  • Related