Home > other >  Issue with Git rebase invalid upstream 'develop'
Issue with Git rebase invalid upstream 'develop'

Time:03-26

I have a situation whereby I created a feature branch from develop 3 months ago, and since that time other colleagues have been working on other feature branches which got released into develop. I now want to integrate my changes into develop as well without breaking thier code and without my code breaking.

I researched this and found the answer here Easier way to keep a git feature branch up to date. Followed the steps below.

git checkout feature/foo
git pull --all
git rebase develop

I get the error. fatal: invalid upstream 'develop'

On reading the page further, I tried the below

git checkout develop
git pull
git checkout feature/xxxx
git merge develop 
git push

My question is this, after following the link specified earlier, why did I get the error. fatal: invalid upstream 'develop'. I am asking this in cases where I want to use the rebase option.

CodePudding user response:

This answer is in two parts, because one is all background and one is the answer to the question you asked. You should probably read this part first, even though it's just background.

Background (Long)

I think that the Q-and-A pair in question here (the linked question plus its accepted answer) are just not very good. The tricky part here is that Git is both simpler and more complex than people think, and people get wrong ideas into their heads, which takes a lot of work to get rid of and replace with the correct model.

The wrong model that people have in mind is that branches are somehow the thing in Git. But they're not: they're not the thing, whatever "the thing" may mean. The problems are that "branches"—whatever we may mean when we use that word loosely—are ambiguous, and when we mean "some subset of commits in a Git repository", they're simply a consequence. That is, these branches are like the fact that you have to breathe hard after sprinting. You don't sprint so that you'll have to breathe hard: you sprint to win a race, or to get exercise, or something along those lines. The breathing-hard part happens, but it wasn't the goal.

Similarly, branches (whatever we might mean by that) happen in Git because of the thing we—and Git—really do care about. "The thing", in this case, is the commit. Git is all about commits. Commits are the raison d'être for Git. As such, it's crucial to understand the following:

  • A Git repository is a collection of commits. In fact, a repository is, at its heart, two databases. Both are simple key-value stores. One holds Git's internal objects, including the commits (which are the objects we humans will generally care about here), and the other holds names—branch names, tag names, and other names.

  • Commits are numbered. Every internal Git object gets a number; commits in particular get a globally unique number, which we call a hash ID, or sometimes a Git OID (Object ID). In the past, Git called these SHA-1 hash IDs (because the current OIDs are in fact SHA-1 hashes), but Git is moving to SHA-2 due to SHA-1 having been effectively broken.

  • Each commit in turn stores two collections. We'll get back to this in a moment.

The fact that each commit has a totally unique number means that any two Git repositories, on contact with each other, can tell whether they contain the same commits just by looking at the numbers. Your Git software, working with your Git repository, can reach out to other Git software working with another Git repository: you might call this other Git origin for instance. Your Git thus calls up the Git at origin and has them list out (some of) their commit hash IDs. If your Git has the same IDs, you and they have the same commits and you're in sync. If not, one of you has some commits that the other doesn't, and/or vice versa. Git is generally quite greedy for commits, so at this point one Git—the one with extra commits—will give commits to the other Git, that the other database lacks. The receiving Git will add those commits to its collection, which will add them to its collection, rather Borg-like. ("We will add your biological and technological distinctiveness to our own.")

The numbering system has a few consequences. One is that because the numbers are cryptographic digests, they're quite random-looking, and inhospitable to humans. Nobody can remember all the hash IDs. Fortunately we don't have to do that: the computer can do that, and the computer is good at that. The other is that because the commit's hash ID is a cryptographic checksum of the commit's content, no part of any commit can ever be changed.

The simple part of Git

Every Git commit has two parts:

  • Each commit stores a full snapshot of every file. The files inside a commit are stored in a special, read-only, compressed and—important for various reasons—de-duplicated format. Because all parts of every commit are read-only, it's safe for any commit to share any of its file content with any other commit (or even other parts of the same commit). So no matter what you do with commits—e.g., add a million identical ones—you won't bloat up the repository with duplicated files, even though every commit stores every file in a logical sense.

    This snapshot aspect of a commit means that it's easy to get every version of the stuff you've ever committed: just find the right hash ID of that commit and there are all your files, exactly as they were at the time you committed them. So everything is saved for all time, or at least, for as long as you can find the commits' hash IDs.

  • Separately from the snapshot, each commit stores metadata: information such as who made the commit—name and email address—and when, or why they made the commit (their log message: the meaningfulness of this depends on the human, so not every commit message has a good "why" in it).

Now we get to the sneaky tricks in Git. These are not complicated—not yet anyway—but they're the first key to understanding branching. In the metadata for any commit, Git stores a list of previous or parent commit hash IDs. This list is usually exactly one entry long, giving each commit a single parent hash ID. This kind of commit is an ordinary commit, the kind you make every day, and when we lay them out next to each other in the order you make them, with the latest at the right:

... <-F <-G <-H

we get a simple backwards-looking chain. Here H stands in for the real hash ID of the latest commit you just made. It has a snapshot and metadata, and in its metadata, commit H stores the raw hash ID of earlier commit G. Because Git has a simple key-value store, in which it can look up G's hash ID and obtain commit G, Git can actually work with both commit H and commit G "at the same time", as it were. We just have to give Git the hash ID of commit H.

Commit G, though, is an ordinary commit: it has a snapshot and metadata, and in its metadata, commit G stores the raw hash ID of earlier commit F. So Git can look up the actual commit itself, using just the hash ID of G to find G's metadata to find F's hash ID. So now Git has the G-and-F pair.

In other words, starting from H, Git was able to move back one to G, and from there, Git was able to move back one step again to F. Commit F is of course also an ordinary commit, with one parent, so Git can now move back one more step. Git can repeat this forever, or at least, until it gets back to the very first commit. This first commit can't point backwards, so it just doesn't:

A <-B ... <-G <-H

and if we have Git start at H and work backwards one hop at a time, Git eventually reaches commit A and stops there.

This is the history in the repository. The commits contain the snapshots; every commit stores every file (with de-duplication); and by moving backwards, one commit at a time, Git finds every commit in this simple linear chain. There's one big hitch though: we have to give Git the hash ID of commit H. How do we find that?

Branch and other names

This is where branch names enter the picture. In Git, a branch name—or any other name, for that matter—just contains one hash ID. Assuming that's a commit hash ID,1 that gives us—or Git—a last commit to start from. From there, Git can work backwards. Since commits point backwards to their parents, and that's the history in the repository, this is how Git finds history.

Note that if we have more than one branch name, we can have more than one "last commit". To illustrate that, suppose we have a chain of commits that ends at commit H:

...--G--H   <-- main

We now create two more branch names, such as br1 and br2, both of which also point to H at the moment:

...--G--H   <-- br1, br2, main

All the commits are on all three branches at this point. But as we make new commits, Git will move one (and only one) branch name "forward" while we do that. If we start with br1 and make a new commit I, it will point back to existing commit H and drag the name br1 forward:

          I   <-- br1
         /
...--G--H   <-- br2, main

When we make a second new commit we get:

          I--J   <-- br1
         /
...--G--H   <-- br2, main

The name br1 points to J; J points backwards to I; I points backwards to H; and so on. So by starting at br1, Git will find all the commits. Starting at br2 or main, Git will find only the commits that end at H. We have two branches—or is it three branches? That depends on what we mean by the word branch, doesn't it?

Anyway, suppose we now switch to using the name br2 and make two more commits. Now we'll have:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2

We now seem to have three branches. We definitely have three branch names. Which branch(es) contain commits up through H? Git's answer is: all of them. In fact, we can safely delete the name main at this point, if we don't care to find H directly:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2

Now we only have two branches. We still have exactly the same commits though. The branches don't matter! It's only the commits that matter.

That doesn't mean that the branch names are useless, of course. We—and Git—use them to find last commits. If we want to find any particular given "last" commit quickly—e.g., if we want to count H as a "last" commit, even though it's also an intermediate commit—we'll need a name for it.


1Non-branch-names can sometimes contain non-commit hash IDs. This is mainly a feature for making tags more useful. Branch names must always hold commit hash IDs.


The complicated parts of Git

I said above that commits are read-only snapshots with metadata. This is true: the files inside a commit literally cannot be changed, and furthermore, they are in a format that other (non-Git) programs cannot even read. You literally can't do any work with these files! But we need to get work done, so how do we do that?

Git's answer is: you don't work on, or with, the committed files. Instead, Git extracts a commit, into a work area, where you actually do your work. What this means is that, literally, the files you work with in Git, are not in Git. They get copied out of Git and you work on and with the copies.

When you go to make a new commit, you still don't use these files directly. Instead, Git has stored what amount to copies of these files,2 ready to go into a new commit. This extra copy of each file occupies something for which Git has three names: the index, the staging area, or (rarely these days) the cache. All three names refer to this same thing, which I like to describe as your proposed next commit.

This explains what git checkout or git switch is doing. When we use either of these commands and give it a branch name, we're really picking the commit we'd like to extract. For instance, if we have:

          I--J   <-- br1
         /
...--G--H   <-- develop, main
         \
          K--L   <-- br2

and we run:

git switch main

we are telling Git that we want to start working on / with the files that are in commit H. Git should now:

  • erase, from our work area and proposed next commit, the current files that are there from a previous checkout;
  • install, into our work area and proposed next commit, the files from commit H.

To remember which branch name we're using, we'll update our drawing like this:

          I--J   <-- br1
         /
...--G--H   <-- develop, main (HEAD)
         \
          K--L   <-- br2

The special name HEAD, in all uppercase, is attached to just one branch name. That's the branch name of the branch we are "on". So if we were on br1:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- develop, main
         \
          K--L   <-- br2

and are now on main, Git has swapped out all the commit-J files for all the commit-H files.

There are some special cases here. Sometimes Git doesn't have to switch out the files. Suppose that we were on develop, which means "commit H", when we ran git switch main to switch to commit H. We're telling Git to switch from H, to H. That's not really much of a switch, is it? In this case Git doesn't have to change out any files, and so it just doesn't bother.

This case becomes important if we don't have a develop yet. Suppose we're on main:

          I--J   <-- br1
         /
...--G--H   <-- main (HEAD)
         \
          K--L   <-- br2

and we start changing a bunch of files. Then we realize: Hey, wait, I meant to do this work on a new branch. We can create a new branch and switch to it right now, and as long as the new branch also means "commit H" right now, that switch is totally free, because Git won't need to swap out any files. We can leave our partially-completed work just partially-completed, creating a new branch name test2 for instance:

          I--J   <-- br1
         /
...--G--H   <-- main, test2 (HEAD)
         \
          K--L   <-- br2

If and when we eventually make a new commit—let's call it N—we'll get:

          I--J   <-- br1
         /
...--G--H   <-- main
         \__
          \ `--N   <-- test2 (HEAD)
           \
            K--L   <-- br2

New commit N will point back to old commit H as its parent, because we made commit N from commit H.

This index or staging area—the extra copies of each file that make up the proposed next commit—explain why you have to run git add. When you do run git add, Git will:

  • read the working tree copy;
  • compress it into Git's internal format; and
  • check for any existing (duplicate) copy.

If there's some existing copy, Git can discard the compressed version it just made, and use the duplicate. If not, Git will arrange for the new compressed version to go into the repository if and when we finally do commit it.3

Although this is a bit complicated, the parts to memorize aren't that bad:

  • You don't work on committed files. You work on copies of them. Git extracts the copies from some existing commit.
  • The files you do work on are not in Git. They're copies.
  • Until you run git add on them, Git doesn't even care if files have been updated. You should run git status often enough to see which files you haven't yet git added.
  • The git add step means make the index / staging copy match the working tree copy. That is, it updates your proposed next commit.
  • When you run git commit, Git makes the new commit's snapshot from the proposed next commit. This is why you have to git add: to update the proposed next commit, so that git commit will commit that.

This also leads to a proper understanding of git status—but we'll come back to that in a moment.


2The "copies" in Git's index or staging area are already de-duplicated, and remain that way at all times, so unless you've altered a file and run git add, these copies take no space. Technically, what's in the index is really just the file's name, mode, hash ID, and cache data, plus a slot number used during merging; we won't cover this at all.

3Technically, Git adds a new blob object immediately. If we end up not committing it after all, Git will eventually clean it up, providing we don't wind up doing another git add and git commit that does eventually commit it. So if you have a very big file—say, a few dozen terabytes or petabytes—you probably don't want to git add it unless and until it's really necessary. For small files, though, it usually doesn't matter.


Snapshots vs diffs

I keep coming back to the concept of snapshots, because commits are snapshots (plus metadata). But if we look at a commit with, say, git show or git log -p, we don't see a snapshot. Instead, we see a diff:

$ git show | head -25 | sed 's/@/ /'
commit f01e51a7cfd75131b7266131b1f7540ce0a8e5c1
Author: Junio C Hamano <gitster pobox.com>
Date:   Mon Mar 21 14:18:51 2022 -0700

    The thirteenth batch
    
    Signed-off-by: Junio C Hamano <gitster pobox.com>

diff --git a/Documentation/RelNotes/2.36.0.txt b/Documentation/RelNotes/2.36.0.txt
index d67727baa1..f1449eb926 100644
--- a/Documentation/RelNotes/2.36.0.txt
    b/Documentation/RelNotes/2.36.0.txt
 @ -74,6  74,10 @@ UI, Workflows & Features
    refs involved, takes long time renaming them.  The command has been
    taught to show progress bar while making the user wait.
 
  * Bundle file format gets extended to allow a partial bundle,
    filtered by similar criteria you would give when making a
    partial/lazy clone.
 
 
 Performance, Internal Implementation, Development Support etc.
 
 @ -132,6  136,12 @@ Performance, Internal Implementation, Development Support etc.
 

The things with the @s in them are diff hunks, and before this we get a diff header:

diff --git a/Documentation/RelNotes/2.36.0.txt b/Documentation/RelNotes/2.36.0.txt
index d67727baa1..f1449eb926 100644
--- a/Documentation/RelNotes/2.36.0.txt
    b/Documentation/RelNotes/2.36.0.txt

What Git has done is to take commit f01e51a7cfd75131b7266131b1f7540ce0a8e5c1, use its metadata to find its parent bc3838b310b32081d48393ba0dcf26e4735c6d19, and extracted the file Documentation/RelNotes/2.36.0.txt from both commits. On the "left" (as a/), Git puts the earlier version of the file; on the "right" (as b/), Git puts the later version of the file. Then Git plays a game of Spot the Difference. The first difference Git saw was that Junio added four lines around line 77. The diff shows the added lines, plus a bit of context, then moves on the next change that Git found, which is to add more lines around line 135 (in the old version) or 139 (in the new one).

In other words, Git uses the metadata in the commit to find the (single) parent. This gives us two snapshots, which Git can compare. But in fact, Git can make a diff from any two snapshots, not just ones that are right next to each other:

...--E--F--G--H   <-- somebranch (HEAD)

Here git show will compare G and H, as those are the two adjacent commits, but we can run:

git diff <hash-of-E> HEAD

and have Git compare the snapshots in E and H directly, and show that as a diff. This all works because every commit holds a full snapshot, and Git can easily compare any two snapshots. In fact, due to the internal de-duplication, Git can compare two snapshots very quickly as long as most of the files are duplicates: it only has to look at those files that aren't duplicates. So overall, this is quite easy for Git.

Merging

This all leads us to git merge, which is where Git gets much of its real power. Let's go back to this setup again:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

This tells us that we're "on" branch br1–that is, we did a git checkout br1 or a git switch br1—and that we're using the files from commit J. Let's also say that we haven't touched any of these files (so that the index and working tree copies all match the commit-J copies). We now run:

git merge br2

Our goal here is to combine changes. That is, we want to take any work we, or someone else, did on our br1 branch, and any work we or anyone else did on the br2 branch too, and combine the work.

We just saw that Git doesn't store changes. But we also saw that Git can easily compare any two commits. How will we get Git to combine changes? We have to do some diff-ing.

We could compare the snapshot in J to the one in L, but that doesn't really get us what we want. The trick here is to use the metadata a little differently. Commit J has parent I, and commit I has parent H, which has parent G, and so on, backwards. Meanwhile commit L has parent K, which has parent H, and that goes back to G, and so on. Some of these parents are shared. In fact, as soon as we get back to H, every parent from there backwards is shared. That means commit H, which is on both branches, is the best shared parent. Git calls this "best" shared parent the merge base.4

By using this best common ancestor, or merge base, Git can run two git diffs:

git diff --find-renames <hash-of-H> <hash-of-J>   # what we changed
git diff --find-renames <hash-of-H> <hash-of-L>   # what they changed

These two diffs apply to the same snapshot—the one in commit H—and now Git can combine the diffs. As long as we touched files they didn't and they touched files we didn't, that's easy. When we and they touched the same files, Git's rules here are simple: if we didn't touch the same lines, and our changes don't butt up against each other, Git takes both changes. If we did touch the same lines, Git requires that we make the same change to those same lines, and then Git takes one of those changes. If our changes can't be combined, Git calls that a merge conflict.

This is what merge conflicts are about. Git has picked some merge base commit, and has diff-ed its snapshot against two other commits' snapshots. Git is now trying to combine changes. Git has encountered a case where its simple, line-based, text-oriented rules don't have a simple answer for how to combine these, so Git says "conflict".

Note: Git can also detect files that were all-new, or removed entirely, or renamed. This produces a different kind of merge conflict—some call it a tree conflict; I call it a high level conflict—that doesn't involve particular lines within a file, but rather some entire thing to do with that file. For instance, suppose we added some functions to subroutines.py and they deleted subroutines.py entirely. Git has no idea how to combine "add these lines" with "delete this file", so it will call that a modify/delete conflict.

In all these conflict cases, Git dumps the job of resolving the conflict onto the human, who presumably understands the file's contents. The human doesn't just apply simple text-substitution rules. The human knows whether changing red ball to blue ball on one side of the merge, and red ball to red cube on the other side, should result in blue cube, or maybe in green pyramid or whatever.

But if there isn't a conflict—if the merge goes smoothly—Git will take the combined changes, whatever those wind up being, and apply them to the base snapshot. That is, given:

          I--J   <-- br1 (HEAD)
         /
...--G--H
         \
          K--L   <-- br2

Git combines our H-vs-J changes with their H-vs-L changes and applies both changes to H. That keeps our work and adds theirs, or keeps their work and adds ours, however you'd like to look at it. Then Git makes a new commit from this result, and this new commit is special in exactly one way:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

Commit M is a merge commit. It has a snapshot, just like any commit. It has metadata, just like any commit. What's special about it is that instead of one parent, it has two. Commit M points back to existing commit J, in the way a new commit does. But commit M also points back to merged commit L.

This—the two parents—is what makes commit M a merge commit. Of course the snapshot, in this case, is the result of merging changes as well, but that's not what makes M a merge commit. It's the two parents that make M a merge commit.

Note that, as usual, Git has updated the current branch name to point to the new commit. So br1 now means "commit M", not "commit J". No commits have changed—no commits can ever change—but the branch name has moved as usual.

What's unusual is that because M points back to L as well as to J, we may no longer care about finding commit L with a branch name. It's now safe to delete the name br2:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L

because we can still find all the commits by starting at M and working backwards. It's trickier now, because when we step back once from M, we have to visit both commits J and L. Then we have to visit both I and K, and then we visit H once, and then step back to G, and so on. Git knows how to do this, but it is tricky, and this is one of the harder things to understand in Git. Peculiarly, the split—where the two branches fork off from H initially—is actually easier, and the merge at M, where the two branches come together, is hard. That's because Git works backwards, and when we work backwards, it's the merges that split, and the splits that merge, as it were. But remember that commits hold snapshots and metadata and you'll be fine: the merge holds a snapshot. Preparing the snapshot might have been hard, and figuring out why it's that snapshot, despite what's in the parent commits J and L might be hard, but it's still just a snapshot.

Do note, though, that the "show a commit as a patch" trick that git log -p uses for ordinary commits stops working here. When we have:

...--G--H   <-- branch (HEAD)

Git will compare the G and H snapshots to show a diff, but when we have:

...--J
      \
       M   <-- branch (HEAD)
      /
...--L

which snapshot should Git compare to the snapshot in M? The answer git log uses by default is this is too hard, so I won't bother showing anything at all. That's not a very good answer, but be aware of it. (There is no single right answer to the dilemma, but it might be nice if git log -p inserted something to indicate it didn't bother to do any work here.)


4Technically, the merge base of two commits is found using the Lowest Common Ancestor algorithm on the DAG formed by the commits. Sometimes there's more than one LCA, and this complicates merging, but we'll ignore this case entirely here.

CodePudding user response:

I'm going to assume a lot of background, so see this answer first.

I researched this and found the answer here Easier way to keep a git feature branch up to date. Followed the steps below.

git checkout feature/foo
git pull --all
git rebase develop

Let's assume here that feature/foo is an existing branch name in your repository. The reason to assume it is this:

  • The git checkout command has some extra features. These used to be always-on, but you can now turn off the "guess" mode with --no-guess.

  • If you don't turn off --no-guess, git checkout will sometimes create a new branch name (and then do the checkout).

Let's pause now and look at what happens when you connect two Git repositories to each other, e.g., using the name origin. I mentioned earlier that when two Git repositories meet, one of those two repositories generally winds up giving the other Git software some or all of any new-to-it commits. They use the raw hash IDs, and their branch names—each of the two Git repositories has its own branch names; it's only the commits and their hash IDs that are literally shared—to accomplish this.

When you use git pull, you're telling your Git to run two Git commands:

  1. git fetch;
  2. a second Git command that you can configure.

We'll get to the second command after talking about git fetch. The fetch command has your Git reach out to some other Git repository. To do that, your Git needs a URL. Your Git has saved a URL under the name origin, and your git fetch is using the name origin to retrieve the saved URL, so your Git reaches out to whatever URL that is. Some other Git software answers the "Internet phone call" your Git makes to that URL. That Git software reads their Git repository, and announces to your Git all of their branch names and all the commit hash IDs that go with those names.

Your Git now looks at those hash IDs first. If you already have the hash ID, you already have their commit. But if not, you're missing some commit(s): your Git software will now ask their Git please send that commit and tell me about its parent(s). Their Git software will send over the parent hash IDs; your Git will check to see if you have those, and if not, request those too, and so on. In the end, then, you'll get all the new commits they have: any commits they have, that you didn't, well, now you do have them. They're all safely tucked away in your own Git repository now.

Once you have all of their commits, your Git now saves all of their branch names. But your Git doesn't use their branch names as your branch names. Instead, your Git takes their names—master or main, develop, feature/foo, whatever names they have—and sticks origin/ in front. Why origin/? Because you called up their Git using the name origin. That name, origin, provided the prefix for these renamed things.

The renamed things are not exactly branch names. They correspond to the other Git's branch names. Git calls them remote-tracking branch names, but since they're not branch names, I like to simplify this to just remote-tracking names.

If we draw the commits in your repository the way I like to, we may end up with something like this, before you run git pull which runs git fetch:

          K--L   <-- feature/foo (HEAD)
         /
...--G--H   <-- main, origin/main
         \
          I--J   <-- origin/develop

The reason you have a main is that your Git created one right after you ran git clone. The reason you have a feature/foo is that you created this one. The reason you have an origin/main and an origin/develop is that they had a main and a develop at the time you ran git clone. Your Git got all of their commits, but didn't create branch names for their branches: your Git made remote-tracking names instead.

Let's assume now that they've made a couple of new commits on their develop. You run git fetch (via git pull), so your Git (your software on your repo) calls up their Git (their software on their repo) and their Git lists out their develop as meaning commit O. You don't have O so your Git asks their Git to send O; they mention that O's parent is N, and you don't have N so your Git asks them to send that too; they say that N's parent is J, which you do have, so that finishes off the set of commits they need to send.

They send those commits to you, and your Git adds them to its distinctiveness, Borg-like:

          K--L   <-- feature/foo (HEAD)
         /
...--G--H   <-- main, origin/main
         \
          I--J   <-- origin/develop
              \
               N--O

Having done all that, your Git now updates your origin/develop to point to commit O, since their develop now points to commit O:

          K--L   <-- feature/foo (HEAD)
         /
...--G--H   <-- main, origin/main
         \
          I--J--N--O   <-- origin/develop

Your git fetch step is now complete. Your Git disconnects from their Git—you don't need the Internet any more, at this point.

Your Git goes on to do the second command that git pull runs. The default second command here is git merge. Here's where things get tricky.

Remember that we ran git merge br2 in our earlier example. We have to tell git merge which commit to merge. We usually use a branch name for that, or—in this case—maybe a remote-tracking name. But they at least two branch names, main and develop, which turned into your origin/main and origin/develop.

What git pull does here is to use an upstream. Each branch name in your repository—main and feature/foo—can have one upstream set. The upstream of a branch is something you would set with:

git branch --set-upstream-to

You do not have to have an upstream set but if you do not have one set, git pull will just complain. My guess here is that, at some point in the past, you ran:

git push -u origin feature/foo

which sent your commits K-L to the other Git over at origin, so that they have them. This then also asked them to create, in their repository, the name feature/foo. Note that there's no prefix here: this is an actual branch name.

When you do all this, they get the new commits you have that they don't—commits K and L—and then they do create their branch feature/foo. That means your Git knows to create origin/feature/foo in your repository, so that the picture really looks like this now:

          K--L   <-- feature/foo (HEAD), origin/feature/foo
         /
...--G--H   <-- main, origin/main
         \
          I--J--N--O   <-- origin/develop

The -u flag to git push tells git push to run git branch --set-upstream-to for you after the git push succeeds. The name that git push uses, when it does this, is origin/feature/foo:

git branch --set-upstream-to=origin/feature/foo feature/foo

which means that the upstream of your feature/foo is origin/feature/foo.

So, your git pull obtained their new commits on their develop and updated your origin/develop. It then updated—or would have updated—your origin/main and origin/feature/foo, except that there was nothing to do here, because those names already selected the right commit.

Then, your Git ran the equivalent of:

git merge origin/feature/foo

to complete your git pull. But that said to merge commit L with commit L, which does nothing at all. So Git doesn't bother trying to merge: it just says Already up to date.

Side note: --all

Git takes that --all flag in git pull --all and hands it to git fetch. Here, it means all remotes. That's all --all means: all remotes. Most people have just one remote, named origin. If you have more than one, this makes your git pull fetch from all of them. If you just have the one remote, the --all flag does absolutely nothing. In any case, it has no effect on the second command that git pull runs. So you never need --all here, and probably should never use it: it does nothing you would care about.

git rebase

The rebase command is complicated, and I won't go into details here (look for other SO answers), but at its heart, it means:

  • I have some commits.
  • I like most of the things about these commits, but there's something I don't like about them.
  • I know no commit can ever be changed. But let me copy the commits that I do have, and before I git commit each copied commit, let me change something first.

Since a commit is a snapshot plus metadata, there are only two things you can change: the snapshot, or the metadata, or in some cases both. The goal of a rebase is therefore to take your existing commits, that are "mostly OK", and make new and improved commits that are somehow better.

To do this, Git needs two pieces of information:

  • What commits should be copied?
  • Where should the copies go?

When you run git rebase develop, Git gets—or tries to get—both answers from that one name, develop. That name:

  1. must exist;
  2. must name a commit.

You got an error because develop, as a name, did not exist:

fatal: invalid upstream 'develop'

Git isn't very good at its error messages, but in this case, that's what it meant.

When you ran:

git checkout develop

you invoked the --guess mode. You had:

          K--L   <-- feature/foo (HEAD), origin/feature/foo
         /
...--G--H   <-- main, origin/main
         \
          I--J--N--O   <-- origin/develop

Note that there is no develop here. Your Git will, at this point, be almost ready to give you an error message (another of Git's low-quality ones that doesn't mention the real problem), but since you didn't use --no-guess, git checkout takes one last stab at solving the lack of name develop. It looks through your remote-tracking names and finds that origin/develop does exist. It changes your:

git checkout develop

command into:

git checkout -b develop --track origin/develop

which tells Git:

  1. create name develop pointing to the same commit as origin/develop;
  2. check out that branch name, having just created it;
  3. set the upstream for develop to origin/develop

That is, this --guess mode is a three-in-one command. Its final result is this:

          K--L   <-- feature/foo, origin/feature/foo
         /
...--G--H   <-- main, origin/main
         \
          I--J--N--O   <-- develop (HEAD), origin/develop

Your next command:

git pull

can now run: it means:

  • git fetch (from origin), which probably didn't get anything new this time since your earlier git fetch was pretty recent;
  • git merge origin/develop, which probably did not have anything to do and would say Already up to date.

Your subsequent git merge would do a real merge, given the drawings I made, and that could well get merge conflicts. You probably wanted to rebase instead, but that gets particularly tricky due to the commit-copying steps.

Bottom line

It's kind of a shame that using Git gets so difficult. Parts of Git really are very simple, and once you get past the trickier ones—the whole idea of working tree and index/staging-area, for instance, or how merge works—the rest starts getting simpler again. Rebase is really just repeated git cherry-pick, and each cherry-pick is a kind of merge (using a forced merge base). Some git merge commands do a fast-forward operation instead of a merge, but fast-forwarding is really just a branch-label-movement trick and is pretty simple itself.

Unfortunately, there's a huge amount of bad assumptions—such as that git pull is for beginners who have not yet learned all the intricacies of fetching and merging—and there is that first lump of "stuff" to get past.

Once you do get past it, though, you'll realize that "commits are snapshots plus metadata" really cuts through a lot of stuff.

  •  Tags:  
  • git
  • Related