Home > Net >  Git fork always commits ahead I don't want
Git fork always commits ahead I don't want

Time:06-03

I have a fork B from an original repo A.

I also have my local (clone?) checked out version on my desktop (of B).

On my fork B, on the Git repo website it says

This branch is 2 commits ahead of A/master

So if I try and do any new pull requests, it always tries to add them in. I don't want the 2 that it thinks are ahead by (one of them was already pulled in, so I feel somehow this has got a bit messed up) :)>

I just want to get B back to sync with A and my desktop synced with that.

On my desktop I've tried things like..

git remote add original A
git fetch original
git checkout original
... uploads some stuff 
git checkout original
error: pathspec 'original' did not match any file(s) known to git

I also tried something similar earlier additionally with

git reset --hard origin/master
git push --force origin master

But nothing seems to make any difference. Either I get an error, or everything just seems the same. My forked repo is 2 commits ahead of the master, my local desktop says everything is up to date.

How do I get around this, so my remote B is synced with A, and my desktop is synced with B. Happy to lose any local work etc.

CodePudding user response:

Part 2 of 2 (go here for part 1)

git push --force

So we now know how to move a branch name, locally. Now let's take another look at git push, and in particular, its --force or -f option. We know that with git push, we normally use it to send our new commits to some other Git repository. We then generally ask that other Git repository to add commits to one of their branch names. If all we're doing is correctly adding commits, and we have permissions,7 the other Git will generally accept that push request.

But the thing is that when we do send them commits, we send them commits, by hash ID, which string together to other commits by hash ID. They don't use names internally, just hash IDs. If we have this:

...--G--H   <-- main, origin/main
         \
          I--J   <-- feature1 (HEAD)

then our origin/main implies that the last time our Git talked with their Git, their last main commit was commit H. That might still be true, but maybe—especially if this GitHub repo is shared with other people who run git push—just maybe somebody else has already added new commits to their main, so that over on GitHub, they have:

...--G--H--N--O--P   <-- main

We'll send them our I-J and they'll drop that into their big database,8 and they will have:

...--G--H--N--O--P   <-- main
         \
          I--J   [proposed update]

Any time we tell that other Git to move a branch name, they're going to check if that's OK. If we tell them to make a new name feature1, that probably would be OK, but let's say we decide, here, to ask them to set their main. They will answer us back with: No! If I make my name main point to J, I will lose my N-O-P commits! That's a Big NOPe! Remember, they, like every Git, find commits by using the branch name to find the last commit and then working backwards. J leads to I which leads to H, which does not lead forwards to N, only backwards to G.

This is usually how we like things like this to work. Instead of pushing directly to their main, we'd push our feature1 commits and ask them to create a new branch named feature1 and that would all be OK.

But ... suppose the Git repository on GitHub is yours, and you had:

...--G--H   <-- main (HEAD), origin/main

and then you added a bad commit I or pair of commits I-J to your main and ran git push origin main and they took them? Now you have:

...--G--H--I--J   <-- main (HEAD), origin/main

indicating that their main (your origin/main) points to commit J, just like your own main.

You now realize that I-J are bad and you run git reset --hard HEAD~2 to drop these two:

...--G--H   <-- main (HEAD)
         \
          I--J   <-- origin/main

If you now run git push origin main, your Git will send their Git any new commits you have that they don't—i.e., none—and then ask them to set their main to point to H, and they'll reject the request because that will lose commits I-J off their main.

But that's exactly what you want. You want them to drop the two bad commits. So the way you make that happen is that you use --force or the fancier --force-with-lease option:

git push --force origin main

This sends the new commits (none) and then, instead politely asking them to make their main point to commit H, commands them to make their main point to commit H. They can still refuse, but again, provided you have permissions, they'll obey this time: Sir yes sir! main updated, commits ejected! And your Git repository will now have:

...--G--H   <-- main (HEAD), origin/main
         \
          I--J   [abandoned]

7Note that "base" Git doesn't have any concept of permissions to modify (push to) a branch, but most hosting servers—including GitHub—add that on too.

8Incoming commits actually go into a "quarantine zone" and are not migrated into the real database until they are accepted. This feature came from GitHub, because GitHub used to accept everything into their databases and only then reject them and that made a big mess for GitHub. So now there's this fancy quarantine feature.


More than one remote

Finally, we have enough to fix it all.

The first trick though is that you must, on your laptop or wherever you have your local clone, set things up so that you have two remotes, rather than just one. You ran:

git clone <url>

originally, where the URL was for your fork. Your fork is the one you want to adjust. We must now add a remote for the repository you forked.

A remote, remember, is just a short name to hold a URL, and Git will make up remote-tracking names using this short name. So you get to make up any name you like here. The standard first name is origin and you already have this one. Some people like to use upstream as their standard second name. I'm not a big fan of this because Git already has something else called an upstream. I'd use another name; here I'll use a silly one, but you should make up something sensible:

git remote add lexluthor <url>

Insert the URL for the repository you forked. Then run git fetch to that remote:

git fetch lexluthor

You now have, in your repository on your laptop, all of their commits (you might already have had all of them in which case this part went fast). You also have remote-tracking names for each of their branch names.

Now you just need to convince your GitHub fork that its branch bran, or main, or master, or whatever, should point to the same commit that is the bran or main or master or whatever commit on lexluthor:

git push --force lexluthor/master origin/master

That's it—that's the whole thing. We send to origin any commits that we have that origin lacks that they need to get their (origin's) branch updated: that's nothing at all because we were "two ahead" and none at all behind. Then we command the Git over on GitHub to make our origin's master identify the same commit that our lexluthor/master identifies, which is the commit that master identifies in the repository you forked originally.

You probably also want your own master to drop the two commits you're ahead. You might want to keep those commits for some other reason / to put on another branch / whatever; for that:

git switch master
git status
# make sure it says "nothing to commit, working tree clean"
# if not, make a new commit now
git branch keep-extras
git reset --hard lexluthor/master

and now your master is in sync with both lexluthor and origin. Note that you could have used origin/master in the git reset line.

What we did was really simple. We just had to go around and around and do it the long way. That's Git for you!

CodePudding user response:

Part 1 of 2 (go here for part 2)

You've kind of jumped into a somewhat-advanced setup. There are three Git repositories you need to worry about here, not two, and GitHub "forks" are clones with some special properties. (Note that plain Git does not have forks and pull requests—these are GitHub add-ons. Other hosting sites also have fork add-ons and/or pull requests and/or merge requests: they are pretty common as add-ons. But none of them are in base Git.)

What you need to know to get started

Git is a Distributed Version Control System or DVCS. Git achieves its "distributed" effect by having multiple repositories, which Git calls clones. So you're going to need to know several things:

  • What, exactly, is a repository?
  • What does cloning a repository do?
  • What special things does a GitHub fork have that a clone doesn't?

We'll come back to the other two after we expand the first one just a bit. There's a lot more we could and should say but I've run out of space and have to split this up anyway...

A repository is mostly two databases

A Git repository is made up of two big databases, plus a lot of smaller ancillary items. The two databases are the important things, and one of them is usually much bigger, and is always much more important:

  • The bigger / more-important database is Git's object database. This holds Git commits and other internal Git objects. Everything inside this database has an OID or Object ID, which I prefer to call a hash ID (you'll see both terms, plus the now-outdated term SHA-1, referring to one specific kind of hash algorithm Git uses to get its hash IDs).

    The important entity for you in this big database is the commit. A Git repository could be—and can be, except for the annoyance of it that we'll see below—nothing about this database full of commits (plus their supporting objects). As such, you'll need to know exactly what a commit is, but we'll leave that for the next section.

    Each object—hence each commit—gets an ID. Commits in particular get a unique ID: when you make a new commit, you get an ID that has never been used before, anywhere, in any Git repository anywhere in the universe. When I make a new commit, I get a unique ID. Everyone's new commit always gets a new ID. This part is the true magic of Git and enables its distributed nature, and it's also mathematically impossible and certainly doomed.1 Fortunately, the sheer massive size of commit hash IDs is so big that the day of doom is probably trillions of years off, long after not only you and I are dead, but the universe itself has more or less expired.

    In order to fish a commit out of the database, Git needs this hash ID. If that database were all there was in a repository, we'd all have to memorize hash IDs all the time. So...

  • The other, usually much smaller database holds names: branch names, tag names, and all other kinds of names. Each name holds one hash ID, which is all that's needed because of the clever design of a commit, which we'll get to in a moment.

Git stores certain hash IDs that it needs into the names database, under names that we (humans) choose. Then we (humans) just provide Git a name, like a branch name, and Git uses that to fish out the big ugly random-looking hash ID Git needs, to obtain the commit.

So a repository consists of these two databases: one full of commits and other supporting objects, and one with names, so that humans don't have to memorize hash IDs.


1See the pigeonhole principle for details. On a simple basis, the fact that the hash ID is already spread pretty evenly across a 160-bit space reduces the collision chance to infinitesimal, but alas, the birthday problem rears its ugly head in turn, so once you have enough quadrillions of commits, it's more like the chance of having your computer explode, which actually can happen. (OK, "sort of" explode.

  • Related