Git - Made a commit with a delete to an untracked file-CodePudding

So for context, I'm working on a project and it has a package-lock.json file.

I accidentally made a commit and pushed with a delete on that file (it's still here locally) so anyone pulling that commit will have his package-lock.json file deleted although it is supposedly untracked and still want it untracked of course.

Does anyone know how to undo that?

CodePudding user response：

Let's mention two things before we really get started:

It wasn't an untracked file.
As Obsidian Age noted in a comment, package-lock.json usually should be committed (and hence tracked). See Do I commit the package-lock.json file created by npm 5?

Now, let's get started on what's going on, and all of your actual options. First we have to define, in a precise way, exactly what a tracked file is. Fortunately, unlike many things in Git, this has a very simple definition: A file in Git is tracked if and only if it is in Git's index right now. Unfortunately, this simple definition uses "Git's index", which ... well, isn't all that simple. Fortunately, it's not all that complicated either, and you must know what Git's index is and does to use Git. You can sometimes get away with ignoring it, but eventually it will bite you—just as it did here.

Git's index is how Git builds new commits. Remember that Git itself is all about commits—not files, not branches, but commits. Each commit has a full snapshot of every file, like a tar, WinRAR, or zip archive, except that it's fancy and Gitty and has its files de-duplicated so that repeated commits don't make the repository grow tremendously fat. (Each commit also has some metadata, but we'll ignore that entirely here.)

The problem with these archives is the same as the problem with any archive: it's in some archive format that your regular programs can't read. (And Git's archives are strictly read-only, at least in principle: not even Git is allowed to overwrite them.) So instead of trying to use the archive directly, we have some program—an unarchiver, perhaps, or in this case Git itself—extract the files from the archive:

tar -xf foo.tar

or:

unzip foo.zip

or:

git checkout main

All three of these operations fundamentally work by unpacking an archive, giving you a bunch of files to work with. The git checkout variant simply first erases the previous checkout, then extracts the commit you asked for.¹ This fills in a work area, which we call the working tree or work-tree. That area is now full of ordinary, everyday files. These files are not actually in Git even though they just came out of Git. They're just ordinary everyday files, and you can do anything you want with or to them.

As a side effect that's super-convenient for Git (albeit perhaps not so much for you sometimes), when Git extracts the commit and fills in your working tree, Git also fills in its index. So, right after git checkout main, Git's index holds all the same files that it got from the latest main-branch commit that it put in your working tree.

As you do your work, you can modify working tree files. You can also create new files, that Git didn't extract in the first place. You can also remove files, whether or not they're files that Git extracted. So your working tree "drifts away" from the commit that Git extracted.

Git's index, however, still holds all those files. They're already in the special internal de-duplicated form that Git uses to make commits, which makes it fast for Git to make the next commit.

If and when you run git add on some file, Git will:

read the working tree version of that file;
compress it down to the internal de-duplicated format, and check for duplicates (which Git can then re-use instead of the just-compressed file); and
kick out the old copy of the file from the index, if it was there before, and put this compressed, ready-to-commit file into the index.

If you run git rm on some file, Git will remove that file from your working tree and from Git's index. If you run git rm --cached on some file, Git will remove the file from Git's index, but not from your working tree.²

Before you run git add, the index held a proposed commit, ready to be committed. After you run git add or git rm, the index still holds a proposed commit, ready to be committed. The result is this: At all times, Git's index holds your proposed next commit. What your git add and/or git rm commands do is update the proposed commit.

¹Actually, it isn't a simple erase-and-extract. Instead, it's a very complicated one where Git avoids erasing-and-extracting all the files it can possibly avoid touching, which makes things go fast. But the way Git does this is, in part, by using what's in Git's index. We're trying to explain the index, so we have to avoid thinking too much about the index here and start with the simpler model: remove / erase, then install.

²In a weird quirk, git add somefile will notice if you've removed some file, and act like git rm --cached somefile. So you can use git add to remove files the index. I personally can never quite get used to this, and always use git rm instead, but be aware of this quirk, that git add sometimes means "remove". One way to remember it is that it really means make the index copy match the working tree copy, and if you've removed the working tree copy, it does that to match.

Review

So let's review:

A commit holds a snapshot and metadata. The snapshot in the commit can never be changed, and it exists as long as the commit itself also exists.
Git's index holds a proposed next commit: a set of files ready to be committed.
The git checkout and git switch commands work by filling in your working tree and Git's index from some commit. So now there are some files—those extracted from the commit—and those files are tracked because they're in Git's index, as well as in your working tree. There may be other files in your working tree that were untracked before this and are still untracked, too. You can't tell just by looking at your working tree files!
Using git add and/or git rm, you can adjust the contents of files in Git's index and even adjust which files are in Git's index. So you can make files become tracked or untracked at this point. The purpose of doing is is to get ready for git commit.

Eventually, after doing all these adjustments, you will run git commit. This will make a new snapshot, and the files that are in this snapshot are exactly those files that are in Git's index: no more, no less, and no different. Because the index holds files that are pre-de-duplicated and all ready for committing, git commit itself is really fast, compared to historical version control systems.³

You can use git commit -a to pretend that you don't need to learn about the index, but this fails in several cases. In particular it won't ever add any totally-new files: it's more or less equivalent to running git add -u && git commit. You can use git commit --only <file> and git commit --include <file> as well, and to understand how these work, you need to know about Git's index. And, of course, to understand an untracked file in the first place, you need to know about Git's index.

When you run git status, Git will compare what's in the current commit to what's in the index, and then separately, compare what's in the index to what's in your working tree. To understand the output from git status, you need to know about the index. Git calls it the staging area here, which refers to how you use it, and is in many ways a better name than "the index", but a bunch of older Git commands still use "index". There's even a third name for this thing: Git sometimes uses the word cache. You see this in git rm --cached.⁴

³In traditional pre-Git version control systems, their "commit" verb meant go figure out what to commit, which could take many seconds, or minutes, or in some extreme cases, hours. Git makes it possible to put millions of files in a repository without that kind of pain, although when you get to this level you have other kinds of pain.

⁴I don't know why there is no git rm --staged. It seems to me that when git diff gained --staged as a synonym for --cached, git rm should also have gained --staged as a synonym for --cached. But it didn't. Git's naming is atrocious. Fortunately, studies show that arbitrary names aren't that bad if you use them a lot. For more thoughts, see, e.g., https://smallstep.com/blog/the-poetics-of-cli-command-names/ and Why are UNIX/POSIX system call namings so illegible?

Commit-vs-commit comparisons

Now, once we have a repository chock full of commits, we often like to see a commit as changes since the previous commit. Even though each commit holds a snapshot (not changes), Git can easily show us changes, because each commit remembers (via the metadata I wasn't going to mention) which commit comes before that particular commit.

So, git show commit simply extracts, to a temporary memory area, two commits: the one we asked about, and the one before it. Then it compares the two snapshots. For two absolutely-identical files (which were automatically de-duplicated), it says nothing at all—and in fact, due to the way it stores commits, it can avoid extracting these files in the first place. For two files where the contents differ between the old and new, Git plays a game of Spot the Difference, and tells us what it discovered. Et voila, we have a diff, even though the commit contains a snapshot.

If the difference includes completely deleting some file, we get the thing you show in your question:

D    frontend/package-lock.json

This means the previous commit has the file, and the commit we're asking Git to summarize lacks the file. In effect, the file has been deleted.

If you check out the previous commit, it has the file, so the file will go into your working tree and into Git's index and now it is a tracked file. If you made the subsequent commit—the one in which it's deleted—you must have told Git to delete the index copy, perhaps with git rm --cached:

I accidentally made a commit and pushed with a delete on that file (it's still here locally)

This is precisely what you'd get with git rm --cached followed by git commit: you first have Git remove the file from Git's index, then you have Git make a new commit from Git's index, and the new commit omits the file. But git rm --cached never touched the working tree copy, and since it's now untracked—ever since the git rm --cached step—it just lies around in your working tree, taking up space and looking cute.

But:

... anyone pulling that commit will have his package-lock.json file deleted

That's correct. That's because they're using a commit where the file exists, and they have the file in both Git's index and their working tree, and they've now asked Git to switch to, or merge work from, a commit where the file has been deleted. So Git dutifully deletes the tracked file from both their index and their working tree.

(They can prevent this by first deleting the tracked file from Git's index, with git rm --cached, to make it become an untracked file, and then checking out the target commit. It's harder to do this with a true merge, though, and even where it does work for them, they have to know to do the git rm --cached first.)

although it is supposedly untracked and still want it untracked of course.

You can't have this. Either the file is in a commit, and then it will be tracked on checkout, or it's not in a commit, and then you have the situation you have now.

What to do about this

The situation is untenable. Your best approach is: don't get into it in the first place.

If package-jock.json should be committed, keep it committed; don't have it as an untracked file. Treat the existing commit in which the file is absent as "poison" and avoid it (or try to get rid of it entirely).
If it really shouldn't be committed, but you need its data, make some other file name hold the data. That is, rename package-jock.json to package-lock.json.template or something.

(In Git, a "renamed" file is just a removed file plus a newly added file where the new file's content is de-duplicated from the old, now-removed file's content. So it doesn't matter how you get there, you just have to create a new file that has the old content. You can use git mv to get there, if that's a convenient way, or you can use any other way to get there.)

CodePudding user response：

Fixed this by soft resetting to last commit (git reset --soft HEAD~1) then stashing the changes (git stash save "commit fix") and then force pushing (git push -f)

After that I cherry picked files from the stash (honestly I don't know the cmd for this, I used Sublime Merge for it) and committed and pushed again.

So all in all, this deletes the last commit and then you can correct it in a new one.