Squash or Remove a range of commits into one (Over 54000 commits)-CodePudding

I have a git repo that I did testing on by creating a bot that creates commits for me about 1 1/2 years ago. I was just learning about git and I wanted to look cool by having thousands of commits.

Essentially, what I did was I created a bot that adds a line to a file, adds that, commits it, then pushes it. So, about 54000 commits are worthless. How would I remove all of those commits? Is this a good idea?

The commits that don't have value, which are the ones that I want to be removed, are in the middle, starting at 0c4068fb3 and ending at 42b8fae4b. So, the legit commits are before 0c4068fb3 and after 42b8fae4b. The commits that don't have value are easily detected. The reason is that when I created my bot, I used all of my commit messages that I had, and I put them in a list, which the bot would select from randomly, and use that for its commit message. So, any commit repeated multiple times is a commit without value. Also, the majority of the commits that don't have value also say first commit, or something like that.

So, here's the link to the commits section where the fake ones start. As you can tell, the commit messages keep repeating themselves.

The actual content inside of the fake commits is just an increased line to a file called bot.txt. So, nothing of any value in those commits.

If not, could you tell me how I can remove all 54000 commits and just keep the ones that actually have value?

Thanks

CodePudding user response：

Update: In your particular case, since all of the commits you wish to remove are writing to a file called bot.txt, and none of the commits you wish to keep write to that file, the simplest course of action is to use git-filter-repo to remove that single file from the entire history. Any commits that only touched that file will fall out of the new re-written repo. The result will be a similar repo without those 54K commits.

Previous Answer: Note this answer might still work for you, but as written the answer below is intended to work in the more general sense on a linear history. By adding the option --rebase-merges to the final rebase command below, you may still be able to accomplish your goal on a non-linear history. Note the main difference here is the 54K commits will be squashed into one commit, which, if that includes creating and finally deleting the file in the last commit, would end up creating a single commit that falls out of the repo as well.

Based on some information from the comments:

TTT asked:

For the "ones that actually have value", where do they fall in the history graph? For example, some good commits before the 54k, some good commits mixed in with the 54k, some good commits after the 54k?

And you answered:

The commits that don't have value are in the middle, starting at 0c4068fb3 and ending at 42b8fae4b. So all the commits that have value are before 0c4068fb3 and after 42b8fae4b

If your history is linear this is straight forward using the squash method and I can guarantee you there won't be conflicts.

# Let's assume you are rewriting branch: master

# Back it up for sanity purposes
git branch master-backup master

# create a new branch
git switch -c temp-branch 42b8fae4b # start a branch from last "bad" commit
git reset --soft 0c4068fb3~1 # reset back to commit before first "bad" commit

# Note right now you have only "good" commits on your branch,
#  and all "bad" commit changes are staged, let's make 1 big commit
git commit -m "Squash all automated commits into one"

# now rebase the remaining commits on master
git rebase 42b8fae4b master --onto temp-branch

Note it's also fairly simple to remove the bad commits instead of squashing, but you can't guarantee there won't be conflicts (unless you happen to know there won't be).