Why does git pull without arguments produce different results than git pull origin <branch name&g-CodePudding

My understanding from the git pull documentation is that

$ git pull

and

$ git pull origin master

should produce the same results, assuming the current branch is master. However, in the former case I pull down one object while in the latter I pull down many more (over 100 today). Clearly these two commands are not synonymous, so which is the correct form for just making sure I have the latest copy of master before I base a new ticket branch off of it? Thanks.

CodePudding user response：

Some of this is just overall poor design in the original git pull, which now has to be preserved for backwards compatibility.

The thing to know here is that git pull means:¹

run git fetch; then
run a second Git command.

You (usually) have your choice of second Git command: rebase, or merge. You have to decide which one you'll use before you see what git fetch fetches, which I consider a fairly major flaw and a reason to avoid git pull.² You also have some choice of what arguments, if any, git pull passes to the git fetch step:

If you specify a remote (origin) and branch-name (master, in git pull origin master), git pull runs git fetch origin master.
If you don't specify both a remote and branch-name, git pull looks at branch.branch.remote and branch.branch.merge, where branch is your current branch. These must both be set (and you must be on some branch at this point as well so that branch makes sense). Git then passes the given remote name, from the first setting.

We must then go over to the git fetch documentation to see what it does with the remote and—if present—branch name or refspec arguments. The remote is straightforward: it sets where the URL comes from, for instance; if the remote is origin, your Git looks up the result of git config --get remote.origin.url to know how to contact the other Git.

The refspec argument is where things get very squirrelly:

A fetch with no refspec argument uses the defaults, from remote.remote.fetch. The standard one for origin is refs/heads/*:refs/remotes/origin/*. This refspec means obtain all the commits found by all of their branches, renaming those to my remote-tracking origin/* names. You can, however, have a single-branch clone or some other highly unusual configuration, where something different happens.
A fetch with a refspec argument causes Git to look for names that match the refspec. Here, things are complicated. A refspec has a general form in which there is an optional force flag, a source part, a colon, and a destination part. The source and destination can contain * wildcards (with certain restrictions; the exact restrictions depend on your Git version). They can be fully qualified references, beginning with refs, or you can let Git do certain matching operations as described in the gitrevisions documentation.

If we avoid covering all possibilities and just concentrate on using a simple branch name as a refspec, we find that the way git fetch deals with this is:

It limits what it will fetch to any commits and other objects needed to synchronize with the commit to which the given branch name points in the other Git. That is, if their Git has name master meaning commit defb9a3..., your Git will make sure that you have this commit in your repository, plus all the parents needed and any other objects needed, by the time the git fetch finishes. But if their Git has other commits and objects that would be needed to update other remote-tracking names, your Git doesn't pick these up.
It then fetches as usual and writes the corresponding information to .git/FETCH_HEAD, where git pull can see it.
In Git version 1.8.2 and later, it opportunistically updates the remote-tracking name (origin/master in this case). It does this by reading through the remote.origin.fetch lines from the configuration. In older versions of Git, step 3 does not occur.

This is a long-winded, but reasonably precise, way to say that git fetch origin fetches all their branches by default, but git fetch origin master fetches only their master. There are numerous possible special exceptions to this short version, but they're relatively rare.

¹In the not too distant past, git pull was a shell script and you could just read it and see where it ran git fetch. This made things a lot clearer. Now it's a big hairy C program. Does it still work exactly the same? Well, we can hope that bugs get fixed, so, probably not exactly the same. But the overall behavior is supposed to remain.

²It's not that you can't recover, if git fetch fetches something you didn't want to rebase-or-merge-with. It's that it's easier to run several commands, with git log inserted in the middle whenever that seems appropriate. Then there's no need to recover from a mini-disaster.

Which one is better?

This is one of those questions that doesn't have an answer, really. It's like asking whether chocolate ice cream is a better flavor than strawberry, or whether broccoli or cauliflower is a better vegetable.³ They're different, and whether one is better for you, for one specific purpose, depends on both you and the purpose.

We can, however, note that if you let git fetch fetch everything now, a later git fetch will generally have less to fetch. If you restrict git fetch right now to just their master, and later you also need commits from their dev and have to run git fetch origin dev, you might have been better off running a single git fetch origin that got everything.

If you plan to make use of n remote-tracking names, where n > 1, a simple git fetch that just gets everything is convenient.

³It is, however, clear that of all cruciferous vegetables, Brussels sprouts are the worst.