Confusion over origin, master and "branch" names with remote repository-CodePudding

Git is great and I've been using it for a few years now as a basic user, but some things still puzzle me about the naming used. I'm hoping that someone can help explain what I'm doing in simple terms.

Here's my normal workflow

I create a local repository with

git init

I add and commit to it as usual.

git add *
git commit -m"Some message"

After some time, I realise that the project is going to be important or that I need to share it and that I need to have it on a remote repository (yes, in an ideal world, I'd first create it remotely, then clone it locally)

I create a new remote repository on Bitbucket (for example) called someproject.

I type this in my local repository folder

git remote add somename https://******@bitbucket.org/****/someproject.git

I check if it's OK

git remote -v

I then run this command so that I can overwrite the remote repository.

git push --set-upstream somename master --force

What I don't understand is the difference between someproject, somename, master, and origin. It all works but I don't understand what I'm doing. I've tried reading a few manuals but I'm always confused.

Thanks if you can help!

CodePudding user response：

Git's terminology is a mess. It's no wonder it confuses everyone.

Here are the basics:

A repository is a collection of commits, plus information Git needs to find those commits.
A commit is numbered, but its number is large, ugly, and random-looking. It is a unique-to-that-commit hash ID, which Git also calls an object ID or OID. Once any one specific commit has "used up" some hash ID, no other commit anywhere in any Git repository ever is allowed to re-use that hash ID.¹
Each commit contains two things:
- A commit holds a full snapshot of every file, rather like an archive (tar or rar or WinZip or whatever). To keep the repository from becoming enormously fat, these archives contain the files in a special, read-only, Git-only, compressed and de-duplicated format. So when commits re-use earlier files—most commits mostly re-use most files—they don't get stored again, they just get re-used. A commit that exactly matches some previous commit uses no space at all to store the files. (It uses a tiny bit of space to store the metadata mentioned in the next point.)
- A commit stores some metadata, or information about the commit itself. There's a who: a name and email address. There's a when: a date-and-time stamp. (In fact, there are two of each of these.) There's a why: the log message that whoever made the commit put in, to explain why they made that commit. And, crucially for Git's own operation, there's a list of parent hash IDs, for the commits that come right before this particular commit. This commit-hash-ID-list usually has exactly one entry in it. The entries in this list, plus the commits themselves, are the history in the repository: that's all there is!
In order to find commits—because the hash IDs look random—a repository also contains some names. These names are split up into categories:
- Branch names are the ones you use the most, probably. A branch name has a few special properties, which we'll list in a bit. But in fact, a branch name just holds the hash ID of one (1) commit!
- Tag names like v2.1 or v1.3.5 identify some particular commit: they hold the hash ID of one commit, like a branch name, although some tags—which Git calls annotated tags—do this indirectly so that you can add a bit more information, such as a GPG signature verifying that you assert that this particular commit is particularly good and/or useful.
- Other names: there are a whole host of these, and we'll get back to them in a bit.

That's everything that's in a repository that you will see if you can somehow view the repository from afar. For instance, if you look at a Git repository stored on GitHub, this is what you'll see directly.

If you can look at it closer up, though, a repository also holds some configuration, which you can tweak with git config and with other more specialized Git commands, such as git remote or git branch --set-upstream-to. You'll have these in Git "clones" that you make locally. A clone is just a copy of a repository, but there are some things to know about a clone:

Each clone has its own private configuration. Your git config settings won't necessarily match those of someone else's clone, or a GitHub clone, or whatever.
Each clone also has its own private branch names. In fact, all the names are specific to that clone, but Git normally shares the tag names, so that if they (whoever "they" are) have a v1.3.5, you'll have a v1.3.5 too.

Note that the hash IDs are universal: if you have commit a123456 and they have commit a123456, that's the same commit by definition (or at least, it is if you quote the whole hash ID—a123456 is too short to be a full hash ID, but the real ones are too painful for humans to bother with). The branch names are not universal, and my main or master in my clone can hold a different commit hash ID than your main or master. I can have branch names that you don't, and vice versa.
We like to connect clones to each other now and then, so that we can transfer commits. That's because we like to make new commits. When we do this, we merely add to the repository: all the old commits are still there, we've just added a new commit, with a new unique hash ID.

¹This kind of accidental re-use of a hash ID, producing what I like to call a "doppelgänger commit", would break Git—not to the point of exploding the universe or anything, but the two different Git repositories would be unable to talk with each other. Not exactly tragic, but that's why the hash IDs are so big and ugly, so that this doesn't happen.

This is where the special property of branch names comes in

Before we dive into remotes and such, let's draw some commits in a repository. Remember, each commit has some big ugly hash ID, and each commit has a list of hash IDs stored inside it—in its metadata—that's usually just one entry long. So this makes a backwards-looking chain of commits.

Since hash IDs are too painful for humans to use, I like to use single uppercase letters to stand in for them. Let's call our first commit ever commit A, and our second commit ever commit B, and so on. We'll draw these like so:

A <-B <-C

Commit C points backwards to earlier commit B. Commit B points backwards to A. A has an empty list of previous commits: it doesn't point anywhere because there's no earlier commit to point to, so that's where history ends—or starts, depending on whether you work backwards (like Git) or forwards.

Note that commit C has a full snapshot of every file. So does earlier commit B. It's by comparing the snapshots in B vs C that Git can tell you what changed in commit C. Git can find B from C using the hash ID stored in C's metadata. But to find C from nothing, that's much harder. Git could root around through every commit in the repository, trying to find the one at the end that has no arrow pointing to it, to find commit C, but this would take a while.

So, the trick Git uses here is to have a branch name point to the last commit, like this:

A--B--C   <-- main

Git can now read the name main, which contains C's real hash ID, to find C. Then Git reads C's metadata to find B, and B's metadata to find A, and A's metadata to discover that A is the end (or start) of history.

With this in mind, let's look at how Git adds a new commit. We start by telling Git to extract commit C. The files in C are all read-only, Git-only things that are useless for doing any actual work. Git has to copy them out of the commit "archive", turning them into ordinary everyday files. That's the git checkout or git switch command in operation:

git switch main

tells Git extract the commit to which the name main points, i.e., commit C.

We'll skip over some very important stuff about Git's index / staging-area here, for space reasons, and just go directly to what happens when you run git commit to make a new commit. Git will:

Gather up all the metadata it needs to make D: this includes your name and email address, the current date-and-time, your log message, and—crucially—the actual hash ID of the current commit C.
Snapshot all the files that go into the new commit. This can re-use all the existing files: since all snapshots are always entirely read-only, no existing file will change. Only updated files require new snapshots. Git has actually already built these, so that git commit runs very fast (compared to other pre-Git version control systems anyway).
Write all this out, obtaining a new unique hash ID, which we'll call D:
```
A--B--C   <-- main
       \
        D
```
The sneaky bit: Git now writes D's hash ID into the name main.

The end result is that now main points to D, in your repository:

A--B--C
       \
        D   <-- main

Nobody else's repository has D yet, so they cannot possibly have any name—main or anything else—that points to (their missing copy of) D.

This is why your branch names are yours. Your Git is going to move them around as you make new commits. Your repository's branch names are specific to your repository, and some commits—the ones you've made, but not yet sent off to other Git repositories—are also only available to you.

Note that more than one branch name might point to any given commit. Suppose, for instance, we're up to this point:

...--G--H   <-- main

and we now decide we'd like to create a new branch named dev for development:

...--G--H   <-- dev, main

We now have two names, both of which select the commit whose hash is H. Git needs to know which name we're using, so in our drawings, we add the special name HEAD, like this:

...--G--H   <-- dev, main (HEAD)

This says that we're using the name main to work with commit H right now. If we run:

git switch dev

(or git checkout dev) we get:

...--G--H   <-- dev (HEAD), main

We're still using commit H, but now we're doing so through the name dev. Now when we add a new commit I, we get:

          I   <-- dev (HEAD)
         /
...--G--H   <-- main

The name main doesn't move, and the name dev does move, because we're "on" branch dev. Git writes our new commit's hash ID into the current branch name, whenever we make a new commit. If we make another new commit now, we get:

          I--J   <-- dev (HEAD)
         /
...--G--H   <-- main

If we switch back to main, we get:

          I--J   <-- dev
         /
...--G--H   <-- main (HEAD)

We still have the two new commits. Git does, however, remove from our work area all the files that go with commit J, and put in their place all the files that go with commit H. So now we're working with commit H, not commit J; we're now on branch main, as git status will say. If we make two more commits, we get:

          I--J   <-- dev
         /
...--G--H
         \
          K--L   <-- main (HEAD)

and now we have what people often call "branches".

The word branch is very badly overloaded in Git

We have branch names, like main and dev. These contain one hash ID each, and each one has these special properties:

We can get "on" the branch, with git checkout or git switch. This attaches the special name HEAD to the branch name, as in the drawings above, and makes that the current branch. We are now "on the branch".
We can make a new commit while "on the branch", and when we do, our new commit becomes the latest commit "on the branch".

But the word "branch" also means the series of commits ending with the last one as found by the branch name. Commits I-J are parts of "branch dev". Commits K-L are parts of "branch main".

What about the commits up through H? Those are, in fact, on both branches. They're "part of" both! So a commit can be on more than one branch at a time.

This makes the word branch in Git ambiguous at best, and at worst, useless (because nobody knows what we mean). It's like being at a party where everyone has the same name. "Hey Bruce! Bruce told me to inform Bruce that Bruce can't come, because he's visiting Bruce at the hospital. Will you go tell Bruce to tell Bruce?" (Sometimes it's clear from context, but when it's not, be sure to use another word, or be more explicit, e.g., "branch name".)

Connecting one Git to another Git: the remote

Once we've made new commits—or someone else has made new commits that we'd like to get into our own clone—we find we want to connect two different Git repositories together. In the ancient past, we had to do this by typing in a URL every time. This quickly became painful, so Git invented a lot of different ways to get around that. One of them took root.

To have our Git call up some other Git, we normally use a thing Git calls a remote. A remote is just a short name that—at one level anyway—just stores a URL. When we first clone a repository, with:

git clone ssh://[email protected]/user/repo.git

for instance, our Git software creates our new clone with a single remote already set up in it. This remote's standard name is origin.² From now on, you can have your Git software call up that same repository again, using this name:

git fetch origin

calls up the same Git from which you did your clone and gets any new commits that they have, that you don't. Meanwhile:

git push origin <insert some arguments here, dropping the angle brackets>

sends new commits that you have that they don't.

Except for the transfer direction, these operations are very similar. But there is one big difference. We'll get to that in a little bit.

Your Git repository has branch names. So does their Git repository.

You can add new commits to your branches. So can they.

To avoid having any new commits you added to your repository get lost, when you run git fetch origin to get their new commits from their repository, your Git won't touch your branch names.

²You can select another name, if you like, with the -o option, at git clone time. The usual rule for most people is: Don't do that, as you'll just make things more painful for yourself.

Remote-tracking names

To make this work, the initial git clone you run copies all their commits but none of their branch names. Instead, having gotten all their commits, your Git takes each of their branch names—main or master, dev, feature/short, whatever—and renames each one. Their main becomes your origin/main. Their dev becomes your origin/dev. Their feature/short becomes your origin/feature/short.

In other words, for each of their branch names, your Git software creates a different name in your repository. It renames their branches. I call these renamed things remote-tracking names; the official Git documentation calls them remote-tracking branch names, which uses that poor overloaded word branch yet again.³

After doing all this renaming, the initial git clone creates one branch name in your new clone. You choose which name it should create: you run git clone -b dev url, for instance, if you want your Git to create your dev based on their dev. Your Git takes your renamed origin/dev and makes one dev that points to the same commit as your origin/dev, which your clone made point to the same commit that their dev pointed-to. (Whew!) If you don't use -b, your Git asks their Git software what branch name they recommend, and that's normally master or main, whichever they're using.

After that, you can create more branch names. The git switch and git checkout commands have a guess mode (you can turn this off with --no-guess if you want), where if you run:

git switch rudy

and you don't have a branch named rudy, your Git will look for an origin/rudy and if it finds one, create your own rudy based on your origin/rudy, just like your initial git clone did with main or whatever.

So, after you've created a local branch name of the same name as some remote-tracking name, and made some new commits, you can run:

git fetch origin

and your Git will call up the other Git software, at the URL saved under the remote name origin, and get any new commits they have on their branches. Having done that, your Git will create or update your remote-tracking names, origin/*, based on their branch names.

This sort of thing produces the exact same kind of "branching"—here's the overloaded word again—that we saw before:

          I--J   <-- somebranch (HEAD)
         /
...--G--H
         \
          K--L   <-- origin/somebranch

and now you get to figure out what to do about this. The fact is that Git really cares about the commits, not the branch names, here: the origin/somebranch remote-tracking name works just as well to find commit L as any branch name would.

The key differences between your branch names and your remote-tracking names are:

you can get "on" your branch names, and make new commits and make them update automatically;
you can't get "on" your remote-tracking names: git switch origin/somebranch gives you an error.⁴ Just use git fetch to get these updated.

If and when you find yourself in this situation, it's time to learn about, or use, git merge and/or git rebase (but we won't cover those here).

³Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo

⁴The old git checkout command doesn't give you an error, but puts you in a mode Git newbies should generally mostly avoid, which Git calls detached HEAD mode. Don't do that until you're ready.

`git push` is different

Using git fetch origin—with no additional arguments—has your Git call up the Git over at origin and get all their new commits and update all your remote-tracking names.⁵ This is fairly easy, and almost always harmless.⁶ That is, you can run git fetch at any time: if there are no new commits to get, it quickly does nothing, and if there are new commits, it gets them and updates your remote-tracking names, and all is good.

But with git push you must—well, sometimes—name not only the remote, origin, but also a branch name, e.g.:

git push origin feature/tall

to push your new feature branch. This:

sends to origin commits that you have on your feature/tall branch, that they don't have anywhere (it doesn't send new commits on your dev though); then
asks them to set their feature/tall branch name. This normally takes the form of a polite request.

That is, you don't have them set some reserved-to-you remote-tracking name. You have them set a branch name in their repository.

If they are already using that branch name for something else, they'll usually refuse your request. The jargon-y error they give you for this is:

! [rejected]        feature/tall -> feature/tall (fetch first)

What this means is: I did not obey your polite request, because a branch name can only remember one commit. The one commit you asked me to remember would cause me to forget some other commits. That is, they may have had:

...--G--H--K--L   <-- feature/tall

and you asked them to set their repository up with:

...--G--H--I--J   <-- feature/tall
         \
          K--L   ???

If they do that, they'll stop being able to find commits K-L.

Sometimes—especially when using git rebase—this kind of thing is exactly what you want them to do, and when you do want them to do that, you have to modify your git push command. Instead of ending with a polite request, you have to end that git push with a forceful command: Set your branch! Now! DO IT NOW! Or something along these lines.

Since this is just covering the basics—all of the above are basics!—we won't go into detail on when and how you should use the various kinds of "force-push", as Git calls them. Just be aware of the difference: the polite kind of push means don't lose work, and the forceful kind is required if you mean "yes, do lose work, I mean it!" To avoid losing work, avoid the forceful kind of push.

⁵If you make a so-called single-branch clone, this stops working (on purpose). Don't do that until you're ready to learn about single-branch clones.

⁶The word almost is only here to account for the occasional horrible mistake, where someone accidentally pushes a commit that contains a multi-hundreds-of-gigabytes file to some corporate server. (GitHub would reject this push so it won't happen there.) Then you, a hapless Git newbie, run git fetch and your Git downloads this multi-gigabyte commit and it uses up most of your disk space on your laptop, or something. That's not exactly harmless—and since Git doesn't ever like to give up a commit, it can be a pain to recover from.

In your particular case, you do mean to lose work (on GitHub)

Now, let's note that for your particular setup here, you created a repository on your laptop, say, with git init, and did some work locally. This made a bunch of commits.

Then you said to yourself: "Hey, I think I'd like to publish this project." So you went over to (say) GitHub, using their web interface. You had them create, for you, a repository.

When GitHub create a repository, they optionally stick into it a first commit. The first commit in any repository—the one we saw above as "commit A"—is a little weird because a branch name cannot exist until there is a commit. So GitHub will offer to make your repository with a first commit in it.

This first commit isn't your first commit, in your repository on your laptop. It's a different first commit. So you have, on your laptop:

A--B--C   <-- main

Meanwhile over on GitHub (or wherever) you have created a new repository and had GitHub create a new commit there—let's call this one N for New—and that's the one commit in the GitHub repository:

N   <-- main

You run:

git remote add origin ssh://[email protected]/you/repo.git

for instance, to set up the remote name origin. Then you run:

git push origin main

Your Git will call up GitHub's software. They'll connect to you/repo.git over there. That repository has one commit, N, found by the branch name main.

Your Git will send over commits you have that they don't: that's the A-B-C chain. Then your Git will politely ask that they set their main to point to commit C.

They will refuse, because that will lose commit N. Do you want commit N? If you do, get it from them (git fetch) and so something with it. If not, just tell them: yeah, ok, lose that commit with a git push --force.

In the future, consider having them create an empty repository

GitHub, among other hosting providers, will allow you to create a totally empty repository. This has no commits and no branches, so that when you run your initial git push, you send over your commits and ask them to create a branch. This won't lose any commits—there are none!—so they'll accept the polite request, and you're all set.