My understanding from the git pull documentation is that
$ git pull
and
$ git pull origin master
should produce the same results, assuming the current branch is master. However, in the former case I pull down one object while in the latter I pull down many more (over 100 today). Clearly these two commands are not synonymous, so which is the correct form for just making sure I have the latest copy of master before I base a new ticket branch off of it? Thanks.
CodePudding user response:
Some of this is just overall poor design in the original git pull
, which now has to be preserved for backwards compatibility.
The thing to know here is that git pull
means:1
- run
git fetch
; then - run a second Git command.
You (usually) have your choice of second Git command: rebase, or merge. You have to decide which one you'll use before you see what git fetch
fetches, which I consider a fairly major flaw and a reason to avoid git pull
.2 You also have some choice of what arguments, if any, git pull
passes to the git fetch
step:
If you specify a remote (
origin
) and branch-name (master
, ingit pull origin master
),git pull
runsgit fetch origin master
.If you don't specify both a remote and branch-name,
git pull
looks atbranch.branch.remote
andbranch.branch.merge
, wherebranch
is your current branch. These must both be set (and you must be on some branch at this point as well so thatbranch
makes sense). Git then passes the given remote name, from the first setting.
We must then go over to the git fetch
documentation to see what it does with the remote
and—if present—branch name or refspec
arguments. The remote
is straightforward: it sets where the URL comes from, for instance; if the remote
is origin, your Git looks up the result of git config --get remote.origin.url
to know how to contact the other Git.
The refspec argument is where things get very squirrelly:
A fetch with no refspec argument uses the defaults, from
remote.remote.fetch
. The standard one fororigin
isrefs/heads/*:refs/remotes/origin/*
. This refspec means obtain all the commits found by all of their branches, renaming those to my remote-trackingorigin/*
names. You can, however, have a single-branch clone or some other highly unusual configuration, where something different happens.A fetch with a refspec argument causes Git to look for names that match the refspec. Here, things are complicated. A refspec has a general form in which there is an optional force flag, a source part, a colon, and a destination part. The source and destination can contain
*
wildcards (with certain restrictions; the exact restrictions depend on your Git version). They can be fully qualified references, beginning withrefs
, or you can let Git do certain matching operations as described in the gitrevisions documentation.
If we avoid covering all possibilities and just concentrate on using a simple branch name as a refspec, we find that the way git fetch
deals with this is:
It limits what it will fetch to any commits and other objects needed to synchronize with the commit to which the given branch name points in the other Git. That is, if their Git has name
master
meaningcommit defb9a3...
, your Git will make sure that you have this commit in your repository, plus all the parents needed and any other objects needed, by the time thegit fetch
finishes. But if their Git has other commits and objects that would be needed to update other remote-tracking names, your Git doesn't pick these up.It then fetches as usual and writes the corresponding information to
.git/FETCH_HEAD
, wheregit pull
can see it.In Git version 1.8.2 and later, it opportunistically updates the remote-tracking name (
origin/master
in this case). It does this by reading through theremote.origin.fetch
lines from the configuration. In older versions of Git, step 3 does not occur.
This is a long-winded, but reasonably precise, way to say that git fetch origin
fetches all their branches by default, but git fetch origin master
fetches only their master. There are numerous possible special exceptions to this short version, but they're relatively rare.
1In the not too distant past, git pull
was a shell script and you could just read it and see where it ran git fetch
. This made things a lot clearer. Now it's a big hairy C program. Does it still work exactly the same? Well, we can hope that bugs get fixed, so, probably not exactly the same. But the overall behavior is supposed to remain.
2It's not that you can't recover, if git fetch
fetches something you didn't want to rebase-or-merge-with. It's that it's easier to run several commands, with git log
inserted in the middle whenever that seems appropriate. Then there's no need to recover from a mini-disaster.
Which one is better?
This is one of those questions that doesn't have an answer, really. It's like asking whether chocolate ice cream is a better flavor than strawberry, or whether broccoli or cauliflower is a better vegetable.3 They're different, and whether one is better for you, for one specific purpose, depends on both you and the purpose.
We can, however, note that if you let git fetch
fetch everything now, a later git fetch
will generally have less to fetch. If you restrict git fetch
right now to just their master
, and later you also need commits from their dev
and have to run git fetch origin dev
, you might have been better off running a single git fetch origin
that got everything.
If you plan to make use of n remote-tracking names, where n > 1, a simple git fetch
that just gets everything is convenient.
3It is, however, clear that of all cruciferous vegetables, Brussels sprouts are the worst.