I am new to Git, and I have the following "problem".
Let's assume I have a git repo and that I am on master
. I create a new branch local_tests
from master
, then go to this new branch with the command git checkout local_tests
.
Then I create a directory mkdir local_directory
. Am I supposed to see this directory in master
?
I've made several tests and it looks like it is what happens.
Thanks for your help !
CodePudding user response:
Am I supposed to see this diretory in master ?
No, and you're not supposed to see it in local_tests
neither since you haven't added it nor committed it yet in any of your branches.
However, and that's the point of the thing, Git will never wipe out the entire working directory to populate it again when you perform a checkout
to another branch: it will only delete the files that are already tracked, then replace them with their correct version, leaving all the rest untouched.
That's why your local_directory
directory still shows up when you check another revision out, and that's also it will STOP appearing when you have committed it somewhere once.
If however you create a file or a directory that already exists in a given revision but not in the current one, you'll face a conflict since it's not supposed to be the same ones. Git will then warn you when you try to check out this given revision, which is a very common issue when beginning with Git.
CodePudding user response:
When checking out a commit (which is what happens when you switch branches), Git will delete a directory if all of the following conditions are true:
- The commit you are moving away from had at least one tracked file in that directory.
- The commit you are switching to has no tracked files in that directory.
- The directory has no untracked files in it.
Note I'm defining "in" to include subdirectories as well. So the file .\dir1\dir2\some-file.txt
would be considered "in" both dir1
and dir2
for the purposes of this definition.
In your case, #1 is not met since you haven't committed anything into that directory yet on any branch. If you commit a file into that new directory, then after you switch branches the directory will disappear, as long as the other 2 conditions are also met.
Disclaimer: I have compiled this list from experimental evidence using Git 2.37.1.
CodePudding user response:
Am I supposed to see this directory in
master
?
The only truly correct answer to this question is Mu or "Not Applicable". The problem here is that you are conflating several concepts that, in Git, must be kept separate:
- What is in a commit?
- What does a branch name do? What if anything is the difference between a branch and a branch name?
- What is the working tree? What purpose does it have, and how does Git manipulate it? This brings in the question of Git's index or staging area.
The fact is that files and directories that you can see are not in Git. The work you do, when you work in Git, is with files that are not in Git. This seems weird and contradictory, and it would be except for several things that become clear once you understand the answers to the above three questions.
Let's start with the commit, because that's actually pretty simple and concrete, even if each commit is really hard to see: they're stuffed into Git's internal objects database and to view a commit, you will generally use various things that only show you part of a commit at a time. (Consider the parable of the blind men and the elephant.) Moreover, the commit is Git's raison d'être, so knowing what commits are about is crucial to using Git correctly.
About commits
Let's start the about-commits part with the fact that each commit is numbered. The commit's number is unique, and when I say "unique" I don't just mean in this repository, but rather in every Git repository everywhere in the universe. Every time you make a new commit, it gets a new number that has never been used before, and can never be used again.1 This means the numbers are not simple counting numbers: we don't have commit #1 followed by #2 and so on, since those would obviously get re-used right away. Instead, each number looks totally random. These numbers are great for Git, which is a computer program, and horrible for humans.
The numbers are hash IDs or object IDs (OIDs) and they look like random junk. Git needs the number to find the commit, so you will in fact give Git the number, but except for special cases where you might use the mouse to cut-and-paste a hash ID, you'll mostly use a name to get Git to look up the number for you.
Each commit stores two things. (A commit is made up of various pieces but you don't have to care about this.)
One part of a commit is a full snapshot of every file. The files are stored in a special, read-only, Git-only format, where the files are compressed and—importantly—have their content de-duplicated within and across commits. So the repository doesn't grow enormously fat, even though every commit logically stores every file. Most commits we make are mostly duplicating the files from previous commits. We've only changed a few files. So only the new files have to get stored; the others are simply re-used. They don't take any physical space.
The other part of a commit is its metadata. Here, Git stores information about the commit, such as the name and email address of the person who made it. Git adds some date-and-time stamps and other useful information. One of the things Git adds, something absolutely necessary for Git itself, is that each commit stores a list of previous commit hash IDs.
By storing the previous-commit hash IDs—usually just one per commit—Git sets things up so that, if we can tell Git how to find the latest commit, Git can work backwards from there to find all the earlier commits.
There's one more thing to note here. The rather magical numbering system that Git uses requires that every part of every Git commit be totally read-only. So Git not only stores all the files in each snapshot read-only because that's what we usually want, it does that because it has to. Once you make a commit, you literally can't change it. (You can replace it with a new-and-improved commit, under various circumstances, though. The old, bad commit will continue to exist for a while, but we'll just stop using it.)
1This is technically impossible (due to the pigeonhole principle) and Git will someday fail. The sheer size of the number space, and the fact that re-using a number is OK if the two repositories never "meet", helps put this off for—we hope—so many billions of years that we don't care.
Branch names
A branch name, in Git, stores one hash ID.
What good is storing one hash ID? We actually just said that: "if we can tell Git how to find the latest commit, Git can work backwards from there ...".
A branch name, in Git, stores the latest commit hash ID, by definition. Whatever hash ID is in the branch name, that is the latest commit in that branch. Git calls this the tip commit. That commit—the tip—remembers the hash ID of the second-to-latest commit, which remembers the hash ID of the third-to-latest, and so on.
Suppose we have some series of commits, ending at commit H
, like this:
... <-F <-G <-H
Each commit remembers the actual hash ID of the previous commit. Commit H
—whatever its real hash ID is—stores the actual hash ID of earlier commit G
. We say that H
points to G
. But G
stores the hash ID of an earlier commit, which we'll call F
: G
points to F
. F
in turn points to another, even-earlier commit, and so on.
All we have to, do have Git find every commit from H
and working backwards, is to give Git the correct hash ID for commit H
. Rather than jot it down on paper, or a whiteboard, or stick it in a file or something, we just tell Git: Hey, save this hash ID H
for me. We do that by creating a branch name, and we can add this branch name to our drawing like this:
...--G--H <-- master
Let's say we want to create a new branch. We run git branch local_tests
, for instance. Git creates another name, and sticks the same hash ID into the new name:
...--G--H <-- local_tests, master
So both names select the same commit—the same saved snapshot and metadata—which finds the same earlier commit G
, and so on.
This gets us to the difference, if there is any, between a branch and a branch name. The branch name is clear: it's master
or local_tests
. But what's the branch? That depends on the person saying branch, and what they mean when they say it. Sometimes they mean the name. Sometimes they mean commit H
, the tip commit. Sometimes they mean a set of commits that includes H
, but includes some earlier commits too.
The word branch is, in short, ambiguous: people often say it without conveying what they mean, or even without quite knowing what they mean. You just have to guess, or ask, to find out more. In this case, the set of commits leading up to and including commit H
is on both branches at the same time.
We're about to use just one of these two branch names at a time, though, so we need a way to draw which name we're actually using. To do that, we'll attach the special name HEAD
, written in all uppercase like this, to one of the two branch names. Let's pick local_tests
:
...--G--H <-- local_tests (HEAD), master
We get into this state by running git checkout local_tests
or git switch local_tests
(these two commands both do the same thing here).
Your working tree
I mentioned earlier that all the files inside each commit are entirely read-only. They're in a special format that only Git can read, and nothing—not even Git itself—can overwrite them with new content. This means they're quite useless for getting any actual work done. The main thing they're useful for is serving as an archive, like a tar or WinRAR or zip archive.
To do anything with the files in the commit, then, we have to have Git extract the files. This is the main job of git checkout
or git switch
: we pick some branch name that we'd like to have as our current branch, and we tell Git: extract all the files from the tip commit of the branch so that we can use them. Git looks up the hash ID from the branch name, finds that it's hash ID H
, and extracts all those files.
Git has to put those files somewhere, and that "somewhere" is your working tree. Your working tree now has files in it. These files have come out of Git. But they are not in Git. They just came out of Git.
You now work on / with these files. You can create new directories and, if you like, fill them with more files, remove some files, and in general mangle and mutilate your data however you like. These are, after all, ordinary files. All your computer programs work on / with them! But changing these files has no effect on Git. Git does not even look at them, not yet. If and when you're ready to make a new commit, you need to tell Git to scan your working tree.
Git's index or staging area
In almost any other (non-Git) version control system, you would now invoke its "commit" verb. In Git, however, you're forced to run git add
at this point. Why? The answer lies in a thing Git partly hides from you, but which is so important (and/or so poorly named) that it actually has three names. Git calls this thing the index, or the staging area, or—rarely these days, mostly seen in flags like git rm --cached
—the cache.
The stuff in Git's index is crucial, because when you run git commit
, what Git will put in the new commit, as the new commit's permanent snapshot, is nothing more or less than the set of files that are in Git's index right then. This tells us what's in the index—files—but leaves out some important parts.
Now, we know that commits hold files, in a special, read-only, Git-only, compressed and de-duplicated format. And we know that our working tree holds files and directories (or "folders" if you prefer that term) in the form our computer's operating system requires, e.g., path/to/file.ext
is a directory named path
holding a directory named to
holding a file named file.ext
. In Git's internal format, this is just a file with slashes in its name, path/to/file.ext
.
The trick here is what's in Git's index. It's kind of a half-way point, partway between the committed file and the useful one. It's stored in the compressed and de-duplicated format, as a file with a name with embedded slashes, but unlike the frozen files in commits, it's not actually frozen. You can replace or remove it, and you can add all-new path/to/another.ext
files.
When you first check out a commit like commit H
, Git fills in its index from the saved, frozen files. It also fills in your working tree with those same files, but expanded into useful everyday format and turned into files-within-folders (directories). Git will, at this time, make any new directories it has to make, to accommodate the OS's requirement that path/to/file.ext
be made up of path
containing to
containing file.ext
.
It's this filling-in process that populates both Git's index and your working tree, and having just done the filling-in, there are now three "active copies" of each file:
- there's a frozen one in the current commit;
- there's a not-quite-frozen, compressed, de-duplicated copy (or "copy" since it's a duplicate by definition) in Git's index; and
- there's a useful copy in your working tree
and the stored data in these three copies all matches.
As you edit files in your working tree—and/or add and remove files—your working tree drifts away from the index copies. The index copies still match the commit copies though.
Running git add
tells Git: read the working tree and update the index copies of various files to match the working tree copies. Note that there are a lot of special cases and caveats here (having to do with .gitignore
for instance) that I'm skipping to keep this answer short—okay, shorter—but that's the essential purpose of git add
: to update Git's index.
Having updated Git's index to match your working tree—or as much of it as you want to have match, at least—you can now run git commit
. The new commit that Git makes will freeze, into its snapshot, copies of whatever's in Git's index right now. This means you use git add
to update the proposed snapshot.
Consequences of the above
If you don't run git commit
, nothing has changed in any of the stuff Git is actually using. The stuff in your working tree is not in Git. (The contents of files from git add
may sort of be halfway into Git, and this is occasionally useful for disaster recovery, but shouldn't really be counted-on.)
For any new files you create in your working tree and never git add
, Git knows nothing about these files. Various directory-scanning operations, including git status
, will sometimes refer to these as untracked files. An untracked file is simply any file that's actually in your working tree right now and not in Git's index right now, regardless of how that came about.
In general Git doesn't touch an untracked file, but remember that if you're on commit H
, and ask to switch to, say, commit F
or commit K
or something, it's possible that the other commit has a file of the same name as the currently-untracked file.
There's an interesting set of special cases here as well. Let's look at the biggest one of these. Suppose we have this:
...--G--H <-- local_tests (HEAD), master
That is, we're using commit H
right now. Regardless of what we may have changed in our working tree and/or in Git's index, commit H
is the current commit, and the name local_tests
is the current branch name.
Let's say we now ask Git to switch branch names, to master
, with git checkout master
or git switch master
. We're telling Git to move from commit H
to ... commit H
! It's the same commit, as both names select commit H
; we'll end up with:
...--G--H <-- local_tests, master (HEAD)
To make this switch, Git does not have to touch anything in its index or your working tree, because commit H
exactly matches commit H
in every way. So as an optimization, Git doesn't bother doing anything.
A quick look at making new commits
Suppose, though, that instead of switching (or after switching back) so that we have this:
...--G--H <-- local_tests (HEAD), master
we edit some files, and/or create new files, and run git add
on them to create or update index copies of these files. Then we run git commit
, telling Git: Using the files in your index, make a new commit, with a new unique hash ID. Use the current commit as the new commit's parent. This gets us a new commit, which we'll call I
, that points back to existing commit H
:
...--G--H
\
I
This commit is the latest commit (because we just made it). So Git must now stuff its new hash ID, whatever that is, into the current branch name. The current name is the one with HEAD
attached. So Git does exactly that:
...--G--H <-- master
\
I <-- local_tests (HEAD)
If we now git checkout master
or git switch master
, we'll be changing commits. Git will need to remove, from its index and our working tree, the files that go with commit I
. Git will need to fill in, in its index and our working tree, the files that go with commit H
.
Some files are probably exact duplicates, and Git can tell which ones very easily because it's de-duplicated those duplicates. So for those files, Git can leave them alone, and it will do exactly that. It will only remove and/or replace files where they're different. Note that this is why moving from commit H
to commit H
did nothing: every file was the same. But this time some files are (probably) different and Git will have to touch those files.
Final notes
The git switch
code normally makes sure you won't lose any unsaved work. The git checkout
command, when it invokes this code, will do the same. The reason there are two commands now is that git checkout
has extra operational modes, some of which are used to tell Git please clobber my work. It's too easy to invoke these by mistake! Don't use git checkout
; use git switch
, where you can't invoke the wrong mode by mistake.
(Git version 2.23, which introduced git switch
, tries to detect the worst mistake and make you confirm what you meant, but it's still better to use the new commands. To get the "please clobber my work" mode, Git now has git restore
as well.)