I've just started learning Git. For now what I know is that before commit to the main repository, "snapshots" get added to the staging area. The question is, should I care about how long those "snapshots" can reside in the staging area? Will all the data get erased when I, say, turn my PC off?
If there is no lifetime for staging area contents, how does it differ from the repository itself? In this case, to me it looks like a second repository.
CodePudding user response:
Timothy Truckle's answer is correct, but I'll add a few items:
git worktree add
creates not just a new working tree, but also a new index / staging-area that is private to that new working tree.- There is a bug—a rather bad one—in Git versions 2.5 through 2.14, fixed in 2.15, associated with this.
The bug is that the index / staging-area in these added working trees does have a limited lifetime, by mistake. Some files stored in it can be destroyed by mistake after 14 days pass. So if you have a pre-2.15 Git, and use git worktree add
, get your work done within this added working tree within two weeks. (Or update to a corrected Git: updating suffices to keep their added index files intact; you need not do anything with the added work-trees; you just need to upgrade before the staging area gets whacked.)
Optional additional reading
If there is no lifetime for staging area contents, how does it differ from the repository itself? In this case, to me it looks like a second repository.
A Git repository is, to a first approximation, a big database of commits. These are numbered (by big, ugly, random-looking hash IDs) and Git pulls them out of the database by that number. So you must provide the hash ID of any commit to Git, before Git can find the commit. We have branch and other names so that we, mere humans, can use a secondary database—one that turns names into numbers—to help us (and Git) find the commits without having to memorize random-looking hash IDs.
The index, staging area, or (to use the oldest and worst term for it) cache, in Git is not actually a commit. It has a small collection of miscellaneous jobs, so there's no single perfect description of it, but the reason staging area is a pretty good name for it is that it holds, at all, times, a proposed next commit.
(Sometimes—during conflicted merges—it holds a mess, which can't actually be committed. But even then it holds a sort of proposed next commit. It's just that the proposal has too much in it, including extra stuff that can't be committed. In this case git status
will tell you about unmerged files. But if we ignore this particular case, and ignore the other extra roles that the index plays—such as to speed up git status
's handling of untracked files with an optional "untracked cache", for instance—that "proposed next commit" description holds up pretty well.)
Since a repository is a database of commits, and the index / staging-area is merely a single proposed commit that's not actually committed yet, there's a big difference between the index and a repository. The fact that there's just the one index—or more precisely, one per added working tree, plus one original one1—means that you can only hold at most one extra commit here (or N 1 where N is the number of added working trees).
It's worth looking at how git status
works here too:
There's a current commit, findable using the name
HEAD
. This commit—like all commits—is read-only: it literally can't change. (You can switch to some other current commit, i.e., change which commit is current, but you can't change any commits' content.)Then there's the proposed next commit in the index / staging-area. You can change what's in here. They're stored in a pre-Git-ified format, ready to go into a commit: the files are already de-duplicated and Git-ized, once they're in the index / staging-area.
Last, there's some set of files that you're working on / with, in your working tree. You can change these too, because these are just ordinary files. Any program on your computer can read or write these, not just Git.
So when you run git status
, it gets around to doing two diffs, not just one. The first diff compares the files in HEAD
to the files in the index / staging area. For each file that is the same, Git says nothing. For any file that is different, Git says staged for commit
.
Having run that diff, git status
also runs a second diff, to compare the files in the index to those in the working tree. Once again, for files that are the same, Git says nothing. For any file that is different, Git says not staged for commit
.
The reason this sort of shadowy "hide the files" dance is useful becomes clear when you have a large project. Say there are 10,000 files in the current commit. There are therefore 10,000 files (plus or minus one or two maybe) in the index, and 10,000 files (plus or minus a few, plus maybe a bunch of untracked files that you will never commit) in the working tree. If git status
reported (twice!) on all 10k files every time, how would you find the useful data—that you changed three of them, for instance—in all the noise? You want a useful summary, not a list of 9997 unmodified files with three useful names hidden in them.
(If you want the raw list, git ls-files --stage
gets it.)
1The reason for this somewhat klunky wording (why not just "number of working trees"?) is that Git has so-called bare repositories, which have no working tree at all. These repositories still have an index. So in a bare repo, we have one staging area, but zero work-trees. Add three work-trees and we now have four staging areas with three work-trees.
I'd argue that it was fundamentally a mistake to have the initial working tree and index/staging-area in any repository, and that all repositories should start "bare" and then have working trees added, perhaps one by default, but still as an add-on. A truly bare repo would therefore have no index and all working trees would be "added on", rather than having a weird distinction between the "primary working tree" of a non-bare repo and all the add-on ones. But it's long past too late for this. It would be a distinction without a difference to most users, but it would solve a few thorny internal issues, one of which is causing a lot of discussion on the mailing list right now.
CodePudding user response:
Keep in mind, that there is only one staging area.
This means, that the content of the staging area will be taken to another branch when you switch branches.
Git will even refuse to switch (unless you give the force switch -f
) when the target branch conflicts with the staging area just like it does when local changes conflict.
Use the staging area to prepare clean commits. A clean commit is when your project compiles an all automated (unit)tests pass.
Sometimes you may create bigger changes having your code base not compiling but some part of your work should be secured.
Especially when doing Test Driven Development you often experience the opposite: You've completed a microcycle (code base compiles and the unittest(s) passes), but your current unittest is not jet completed.
In both situations you stage the current state (git add
).
As soon as your task is completed (project compiles and all test pass) you add the latest local changes to the stage and make a commit of the staged state.
Yes, this way you will get lots of rather small commits. But this is something good:
small commits need less words to describe them -> first line of commit message usually is sufficient. This way
git log -oneline
shows a documentation of what you have done.small commits have less potential for conflicts on
rebase
small commits make it easier to find a certain change in a file because there are not so many files changed an the commit message can give a better clue which commit may have that change.
small commits make it easier to transfer changes from one branch to another by cherry-picking.
And after all you can still squash them later to one single commit if needed.