Home > Net >  Delete from github repository each file that has been deleted in local folder
Delete from github repository each file that has been deleted in local folder

Time:10-28

I'm new to git, and I can't find an answer for that which is kind of weird. Here is my problem: I want to commit changes to all files within my folder called "folder1" here is the content of "folder1":

folder1
    pyproject1.py
    pyproject2.py
    myimg.png

To do that, I commit my changes to my github repository, using the following command lines:

git add *
git commit-m"my changes"
git push origin main

But if I try to remove for example the file "myimg.png" from my local folder, and execute again the following command lines:

git add *
git commit-m"my changes"
git push origin main

The file myimg.png isn't removed from the github repository. How can I make sure that every time I commit a change in my local folder, every file that is not anymore in my local folder gets deleted from the repository ?

CodePudding user response:

Technically, it is possible to remove a file with git add. This is a bit weird—well, it seems that way to me at least—since "add" seems to mean, well, add, not remove. But there's a much more direct command that literally means remove, which is git rm.

So, what you want is:

git rm myimg.png

which will remove the file from both your working tree and Git's index aka staging area (more about this in a moment), and then:

git add

any other updated files as usual, and then:

git commit -m "commit message"

(you can write the quotes the way you did; this is just my own preference as -m takes one argument, which in the old 1980s-days usually had to be separate like this, even though Git itself didn't come into existence until the early 2000s).

Long and optional reading: what's really going on here

Whatever Git tutorial you've been using has done you a bit of a disservice here by not teaching you first about Git's oddities. Git is a distributed version control system (DVCS), which means extra oddities, but this contains the phrase "version control system", which itself has some basic ideas: any Version Control System in general needs to offer you a way to "get back" old versions. There are a lot of different ways to achieve this. Git, being one of the more modern VCSes (along with Mercurial and Subversion and Bazaar and many others), is based on the idea of commits. The commits act as the checkpoints: you can go back to any older commit, any time you like.

In order to make this work, Git stores every commit as a frozen-for-all-time full snapshot of every file (plus a bit more, that we won't cover here at all). To keep this from using up all your disk space instantly, Git uses a lot of clever tricks, including the idea of de-duplicating content. So if you make 100 commits, each of which contains a 100 megabyte file but the file itself is the same in all 100 commits, rather than making 100 copies of this file, Git has a single frozen copy that they all share. It's quite safe to share the single copy because no part of any commit can ever be changed once the commit is made.

What this means in terms of removing a file is simple enough: the version control system needs to know that the next commit you make shouldn't have the file. If commit #3 has the file, and commit #4 lacks the file, then obviously the file was removed between step 3 and step 4.

The complications in Git come in at many points here: first, while commits in Git are numbered, the numbers themselves are weird. Each commit gets a unique number, and when I say unique, I don't mean "unique within some limit", I mean unique. No Git commit, ever, anywhere, in any repository, has used the number before. No future commit in any Git repository anywhere will ever use that number again!1 Git calls this "commit number" a hash ID,2 and it's big and ugly and not suitable for human consumption, so aside from cut-and-paste with the mouse or whatever, we don't normally use these ourselves.

Still, that big ugly hash ID is the "true name" of each commit. By giving that hash ID to Git, you can get back any older commit, including files you've "deleted". So deleted files are still there in the repository. But this gives us a couple of obvious problems:

  • Committed files are stored in a weird, frozen, Gitty fashion.
  • Only Git can read them and literally nothing, not even Git itself, can overwrite them.
  • That means we can't get any work done.

All version control systems have this problem, and almost all of them use the same solution.


1This is mathematically impossible, of course, so it's not entirely true. But if you bring together two unrelated Git repositories, and they've accidentally used the same number for two different commits, Git breaks. Ideally, this never happens, and in practice it doesn't.

2Hash ID or Object ID (OID), really, and Git used to call this an SHA1 as the existing hash function is SHA-1; see How does the newly found SHA-1 collision affect Git? As noted there, Git is slowly moving to SHA-256, hence the drive to stop using the term "SHA-1".


The "working tree"

In a commit-oriented system, you generally start by picking out some existing commit to check out, using some verb like checkout or switch or extract. Git uses git checkout or git switch.2 The version control system then locates that commit, with all of its files, and extracts the files from the commit, into a work area. Git calls this work area your working tree or work-tree.3 The commit you selected becomes your current commit, and in Git, the branch name you selected here becomes your current branch name. (There is a whole lot of special Git weirdness here, too, that I'm omitting for length reasons.)

So, after a check-out operation (or git switch command), your working tree is now full of all the files, as of whatever form they had at the time you, or someone else, made the commit you've just chosen to use. The files in your working tree are ordinary everyday files, usable by your ordinary everyday editor, your ordinary everyday Python or browser or whatever it is that uses them, and so on. So now you can get work done!

It's important to realize that these files are not in Git. They came out of Git just now, perhaps, but now that they are out, they're not "in". As you work on and with these files, Git is not even aware of that.4 So when you're done messing with the files, it's important that you tell Git. You do this with git add and git rm, but here's where things get really weird. In other systems, like Mercurial, you use hg rm to remove a file, and you just use hg commit otherwise—no need to hg add all the time—because hg commit figures out what you changed. This isn't true in Git.


2The git switch command was new in Git 2.23, the result of splitting up an overly-complicated git checkout into git switch and git restore. You can now use either one of these, but if you have 2.23 or later, I recommend preferring git switch as it's usually less confusing (due to having the complications moved out to git restore).

3The OSes on which Git runs have the notion of a current working directory or cwd, often available with the pwd or Print Working Directory command or $PWD shell variable. Git used to mix together the cwd and work-tree terms in ways people found confusing. It can still be rather confusing, as Git uses the cwd to find the repository, which is stored in a hidden .git folder at the top level of the working tree. This makes some things kind of upside down: the repository is stored in the working tree! This is true even though the working tree is not part of the repository.

Git lets you peek inside the hidden folder if you like, though in general you should (a) not depend on its form and (b) not touch anything inside that folder. Git is very sensitive about the files in this hidden folder, and cloud-syncing software such as Dropbox or iCloud will eventually corrupt the repository. For this reason, it's unwise to store the working tree in a cloud-synced folder: the repository is inside the working tree, and is thus subject to this same sync, which breaks it.

4For speed reasons, modern Git is slowly acquiring a "file system monitor" setup that does make it somewhat aware of things happening here. The design of this must take into account the fact that old Git doesn't have this, and that on most systems, such monitors can lose information sometimes, so except for making things go faster, the FSMonitor is supposed to act like it isn't even there. If your system has the FSMonitor available—currently Windows and macOS do—and you turn it on and it misbehaves, just turn it off again. Linux support is in the pipeline.


Git's index or staging area

As I just mentioned, other version control systems like Mercurial just have you run hg commit. The command spends a bunch of time to figure out what you changed, in the working tree, and commits these changes. (Mercurial uses a changeset model for its commits, rather than Git's snapshot model.) Git is—different. Git forces you to run git add every time. Why? The answer lies in this extra thing that Git has, which is messy and big and has three names:

  • Git sometimes calls this the index, a sort of meaningless name. I like to use this term as it covers everything it does, but it's not very memorable.
  • Git sometimes calls this the cache, as a big part of its job is to make Git go fast. (hg commit can take minutes but git commit usually finishes in a few milliseconds, and only some of that is due to hg being in Python and Git being compiled. A lot of it is because of the index.)
  • Last, in more modern documentation, Git calls this the staging area, which refers to how you normally use it: to stage what will go into the next commit.

What's in the index can be described, pretty accurately, this way: The index holds your proposed next commit snapshot. That's it—that's the real key to the staging area, it's the proposed next snapshot—but that has a lot of consequence.

In particular, you now know (or should know) that the files inside any given commit are in some weird Git-ized format, not usable by non-Git software. So are the files in Git's index. The key difference between a committed copy of a file, and the index copy of a file, in Git, is that the index copy can be replaced. (The committed copy is frozen forever.)

What this means in turn is simple as well: You always have (up to) three copies of every "active" file. That is, suppose your existing commit has files README.md and folder1/pyproject1.py.5 Then there are actually three copies of README.md and three copies of folder1/pyproject1.py.

One of these "active copies" is the frozen, current-commit copy. This copy can't be changed, as it's inside a commit. Another is the index or staging area copy. Initially, it's the same as the committed copy—and since Git's internal format is de-duplicated, it's been de-duplicated to literally use the original. But you can replace this with a new copy: this doesn't overwrite the original, it just adds a new version of the file—or finds another existing duplicate to re-use—and prepares that for committing. The third copy is the ordinary file, in your working tree.

The git add command means: read the working tree copy and prepare it for committing, replacing the old index copy. The git rm command means: remove both the index copy and the working tree copy.

If you run git add removedfile, Git tries to read the working-tree copy of the removed file, and then, at that point, discovers that it's removed. So git add removedfile notices that, hey, the file is gone, but it's still in Git's index. Obviously you meant git rm removedfile! Git removes the file from its index, and quietly does nothing about the already-removed file in the working tree, and that's how git add can mean git rm.


5Note that to Git, this is a file named folder1/pyproject1.py. It's not a folder named folder1 containing a file named pyproject1.py. Git knows how to convert back and forth between the OS's requirement that files live in folders, and Git's requirement that things in the index have a single long file name with embedded forward slashes. But in fact the index can only hold files, which means you can't store an empty folder in Git. There is a trick or two here though: see How do I add an empty directory to a Git repository?


OK, but then why didn't git add * work?

We now come to a point where Windows CMD.EXE, and just about every other command line interpreter in use these days, differ.

In Unix-like shells—bash, csh, sh, tcsh, zsh, and the like—it's the shell—the command line interpreter or CLI—that handles *. You write:

*

or:

foo*

or:

folder1/*

and the shell finds all files in the current directory or all files starting with foo in the current directory or all files in the folder1/ folder and expands out their names.6

If you have removed a file, git add * does not list it, because the shell expanded the * to the set of files that are here. The removed file isn't here. So it isn't listed!

On the ancient CMD.EXE, the CLI does not handle *. This passes a literal asterisk to Git. Because Git tries to accommodate these old systems, Git has its own globbing7 code. This code uses what's in Git's index, so here, git add * will add the removal of the file!

There is a whole lot more to this index, but I am trying to keep this answer relatively short. If this seems like a lot of information, well, remember, a good Git tutorial will have covered all of this by now, and this should all be review to you.


6There's a subtle catch here, controllable in some shells but not all: a file whose name is .hidden does not get found by *, except in bash if you turn on dotglob:

$ shopt dotglob
dotglob         off
$ echo *
cover.html cover.out Makefile ...
$ shopt -s dotglob
$ echo *
.git .gitignore .golangci.yml cover.html cover.out Makefile ...

Most people normally keep dotglob off so that things like .git stay hidden. But this means git add * won't add .gitignore, for instance.

7The expansion of * and other glob characters is referred to as globbing, and these characters are called glob characters, for historical reasons. Originally the shells didn't do this expansion themselves: they ran another program to do it. Early versions of Unix ran in well under 64k of RAM, on computers that maxed out at 64k or sometimes 128k. Think about that the next time you download a 50 megabyte app onto your 128 GB phone.

CodePudding user response:

You will find one way to delete files from Git in this answer.

Anyway, if you delete a file on your local copy, commit and push it, it should be deleted on your remote too. Please try to use git commit without the message argument. This will open a text editor where you can see your changes (This is actually not the way to look at changes, but works very well in this case ;) and post a screenshot here.

  • Related