I am new to Git and I have made some unwanted changes like this:
Now I don't need any of these, so I tried this:
git reset --hard
But it still shows them!
So how can I reset this and get back to where I haven't created any of these unwanted files?
CodePudding user response:
An untracked file:
- did not come out of the current commit (see possible exception below, in the long description);
- is not in the proposed next commit; and
- is lying around in your working tree.
As such, the file is not in Git now, it will not be in Git tomorrow, and so no Git operation will touch it.
If you don't want these files to be in the next commit, there is nothing else you need to do.
Long: What's going on here
Git isn't about files. Git is about commits.
Those new to Git often think it's about files, but it's not. Commits do contain files, but Git is about the commits. They might also think that Git is about branches, but that's not really true either: branch names do help you (and Git) find commits, but Git is still all about the commits.
Since Git is all about commits, you are required to know about them. Here's what you need to know right now:
Each commit has a number. These aren't simple counting numbers: we don't have commit #1, followed by #2, then #3, and so on. Instead, each commit gets some really huge, big and ugly number, seemingly random, like
5a73c6bdc717127c2da99f57bc630c4efd8aed02
for instance.This number is unique to this one particular commit. Every commit everywhere, no matter who makes it, when, where, etc., has to get its own unique number. This is why the number has to be so huge. The number is actually the output of a cryptographic hash function.
Each commit stores two things:
- Every commit stores a full snapshot of every file. This is the main data part of a commit. The files inside the commit are in a special, compressed, Git-only and de-duplicated form: they're not ordinary files at all (and Git can store files that some systems such as Windows can't extract, which creates problems for Windows users). This acts as a permanent archive, like a tar or zip file of all the files in the commit.
- And, every commit stores some metadata: some information about the commit itself, such as who made it and when. This metadata lets Git tie newer commits back to older ones, which is how Git gets a lot of its work done.
Due to the cryptographic numbering scheme, no part of any commit can ever be changed. Once you've made a commit, it's frozen that way forever. (If you make a bad commit by mistake, you can just stop using it. It sticks around for a long time, in case you want it back, but eventually Git will figure out that not only aren't you using it, but—under the right conditions—nobody else ever will be able to find and use it either and therefore Git can remove it entirely. But that's a trickier matter that we won't worry about here.)
But if a commit is read-only (and it is), and the files inside a commit are stored in a format that only Git itself can read (and they are) and literally nothing can write, how will we ever get any actual work done? Git has the same answer here as all version control systems, which all share this kind of problem. You don't work on the files that are in Git. You work, instead, on copies that Git takes out of Git.
Git extracts these usable, workable copies of your files on demand, when you check out a commit with git checkout
or git switch
. The usable files go into a work area, which Git calls the working tree or work-tree. This is pretty simple and straightforward: your working tree is where you get your work done. It has files you can see and use. But these files are not in Git.
Making new commits
Other (non-Git) version control systems start out this same way: you extract a commit, and that gets you useful files. Then you edit the files as needed and when you're ready, you run, e.g., hg commit
(for Mercurial, a different version control system). This non-Git VCS figures out what you did to the files and makes a new commit and you're all set.
Git makes things much harder. Instead of reading your working tree when you run git commit
, Git sets up a separate thing. This thing has three names, perhaps because the first names were terrible ones. The three names are:
- the index: this name doesn't mean anything, which has some good and bad aspects; I tend to use this one myself;
- the staging area: this name reflects how you use the thing, and is perhaps the best name; and
- the cache: this name is not so good, and Git mostly avoids it now, but it lingers in flags like
git rm --cached
.
This thing—the index or staging area—holds your proposed next commit. When you first check out some commit, Git fills it in with copies (or "copies", because they're de-duplicated already: the index "copy" is in Git's internal format) of all the files that are in the commit you just checked out. These files also go into your working tree (as real, ordinary files, rather than weird Git-ized de-duplicated magic).
When you modify the working tree copy of some file, that just changes the working tree copy of that file. No Git file has changed anywhere. The proposed next commit still holds the previous version of the file, the one Git extracted from the current commit.
If you want Git to commit the updated copy, not the old copy you took out of the earlier commit, you must tell Git to update its index / staging-area. You do this with git add
, e.g., git add file.ext
. This tells Git to read the working tree version of file.ext
, compress it into Git's internal format, arrange for the de-duplication as appropriate, and get it all ready for the next commit. This next-commit-ready copy of the file goes into the index / staging-area and now you've updated the proposed next commit.
What all this means is that there are, at all times, three copies of every file (although some of them are Git-ized and hence de-duplicated):
HEAD index work-tree
--------- --------- ---------
README.md README.md README.md
main.py main.py main.py
for instance, if you have three files in the current commit. The HEAD
copies are read-only (and de-duplicated); the index copies are replaceable (but also de-duplicated); and the work-tree copies are usable, ordinary files that aren't de-duplicated but let you get work done.
Running git commit
makes a new commit from the copies that are in the index. So that's why these copies exist: to be ready for the next commit.
Untracked files
Your working tree is just an ordinary directory (or folder, if you prefer that term) on your computer. Because it is an ordinary folder, you can create new files here, or remove existing ones. Git won't know or care that you did so: the proposed next commit is hidden away in Git's index (or staging area). It doesn't have the new files:
HEAD index work-tree
--------- --------- ---------
README.md README.md README.md
main.py main.py main.py
new.txt
Any newly-created files that are in your working tree right now, but aren't in Git's index right now, are untracked files. That's how untracked file is defined, in Git: an untracked file is a file that does exist in your working tree, but doesn't exist in Git's index.
If you run git add
on one of these untracked files, Git will read it, compress it into the internal format, check for duplicates, and so on, and add the new file to Git's index, making it staged for commit. Now the file does exist in Git's index (and also in your working tree) and so it's no longer an untracked file. By changing the set of files in Git's index, you've changed which files are untracked:
HEAD index work-tree
--------- --------- ---------
README.md README.md README.md
main.py main.py main.py
new.txt new.txt
Similarly, you can use git rm --cached
to remove a file from Git's index, but leave it in your working tree. This causes a file that was tracked to become untracked. (That's the special exception I mentioned earlier: if a file was in the commit you checked out, but then you removed the index copy, it's now untracked.)
Again, all this updating happens in Git's index, which you can't see. There's no obvious place to look for the files in Git's index.1 But if Git were totally silent about this, though, it would make Git even harder to use than it already is. So git status
will report untracked files.
Running git status
actually does a bunch of things:
First, it has Git print out your current branch name, and some other useful information.
Next, it has Git compare the current commit, i.e.,
HEAD
, to the index. For all the files that match,git status
says nothing at all. For any file that's different,git status
says that this file is staged for commit. That means the index copy of the file is different.Then
git status
has Git compare the index files to the working tree files. For all the files that match,git status
says nothing at all. For any file that's different,git status
says that this file is not staged for commit. That means the working tree copy of the file is different.2
In all of this, though, git status
ignores any untracked files. It does collect up their names: when Git runs the second comparison, any untracked files show up, and Git now has a full list of the untracked files.
Now git status
will show the untracked files, but now the .gitignore
file kicks in, if you have one.
1You can run git ls-files --stage
to dump out what's in Git's index. This isn't meant for everyday work though, and it's not a good way to get stuff done.
2Note that all three copies can be different. For instance, check out some existing commit that has a README.md
. Add a line to the file and run git add README.md
. Then add a second line to the file. Now all three copies are different. Try git status
. The file is both "staged for commit" and "not staged for commit". That just means HEAD-vs-index shows the first added line, and index-vs-working-tree shows the second added line.
If you run git commit
now, Git commits the index copy of each file. So the README.md
file in the new commit has just the one added line. If you run git add README.md
now instead, Git replaces the index copy, with the one added line, with a new index copy with both added lines.
Untracked files can be "ignored"
If some file is untracked, git status
would normally list it in that section you don't like in your output that shows files like app/Popup.php
. The point of listing this file here is to alert you that you have not yet added the file, and it won't be in the next commit until you do.
But what if that file is not supposed to be in the next commit, or even in any commit? Well, one answer to that problem is that you can remove the file right now:
rm app/Popup.php
Now it's no longer in your working tree. (Note that this removal was not done by Git, or even for Git, it's just something you did on your own.) Since this file was never in Git, it's now gone for good—at least as far as Git goes. Git can't help you get it back. It was never in Git!
But: maybe you don't want to remove it. Maybe you'd like to tell Git two things at the same time:
- Hey Git, stop complaining about
app/Popup.php
! - Hey Git, don't go adding
app/Popup.php
to your index! It shouldn't be committed!
To do both of these things, you can list the file in a .gitignore
file:
echo app/Popup.php >> .gitignore
or:
echo /Popup.php >> app/.gitignore
(these will have the same effect here).
Listing a file, or a glob pattern like *.o
or *.pyc
, in a .gitignore
file, tells Git: stop complaining about this file / files that match this pattern when they are untracked. So that makes git status
more useful: it only warns you about files that should be tracked now.
It also stops git add
from adding the file. You can force git add
to add the file, but by default, git add
won't add the file to Git's index now.
None of this has any effect if the file is already in Git's index. So .gitignore
is not really a list of files to ignore. It's really .git-don't-complain-about-these-files-if-they-are-untracked-and-do-not-add-them-to-the-staging-area-unless-I-force-it-because-they-are-supposed-to-stay-untracked-okay?
, or something like that. But having that as the file name would be crazy, so Git just calls it .gitignore
.
The bottom line
Untracked files won't be in your next commit. Saying "what do I do about these files that I don't want in my next commit" is pointless: they already won't be in your next commit. Asking how to make git status
more useful, by having it not list the files, is useful. It's really pretty straightforward though: you just need to get used to Git's weirdness about having three copies of each file at all times, and that the name .gitignore
is misleading: the files aren't ignored at all, they're just silently untracked as long as they are actually untracked (which is hard to tell without using the debug-like git ls-files
).
CodePudding user response:
As git status
tells you, those files are untracked. That means they are outside of Git's purview. Therefore git reset --hard
does nothing to them.
The command for telling Git to get rid of untracked files is git clean
. It comes with various options; for example, it won't recurse into folders without being to do so, so perhaps you want git clean -d
.
(This is a dangerous command — saying it at the wrong time can basically erase your whole hard disk — so I strongly recommend you say git clean -d -n
to do a dry run and see what will actually happen first.)