Home > Mobile >  Why is github not deleting the files that i moved
Why is github not deleting the files that i moved

Time:10-07

So I was working with git remote from my local machine to Github and when I made my first commit, the directory was web/main and then I pushed it into github. After that I changed my mind so I moved the main folder into a new folder called backend and the directory changed to web/backend/main. Keep in mind I moved it so web/main should not exist anymore and it didn't exist in my local machine, but in github the folder remains there. is this intentional?

git status:

$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean

git push:

$ git push
Everything up-to-date

EDIT:

I should probably point this out earlier but there's a single python file manage.py aside of the folder main in the web/ in the first commit

and also IMSoP point out about manage.py being renamed while other files we're being copied in github. So i checked the first commit on my local machine and weirdly enough I couldn't find main folder which is weird because in the first commit of my github it was there.

Thank you in advance

CodePudding user response:

git doesn't store directories. It only implicitly creates them if you extract files that have such a path. I suppose gethub's web view of a branch works like your normal repository. Updating it will remove the files but not any resulting empty directories. Your local directory was not removed by git but by you and then you committed the changes. The change shows a removed file and an added file.

CodePudding user response:

Cloning your repository (https://github.com/happyprogrammer-code/gittest-django-simpleapp), I find that there are two commits in it:

$ git log --all --decorate --oneline --graph
* fbef865 (HEAD -> master, origin/master, origin/HEAD) organizing the files in to backend file
* 018aed1 first commit

Here's what's in commit 018aed1:

$ git ls-tree -r HEAD^
100644 blob b7c617c5cf9a85407c312d9e32da17de165cf81c    .gitignore
100644 blob 94ff3859fbf86be732f633254a110a860a8bbb65    requirements.txt
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    web/main/__init__.py
100644 blob 63356bd12bf3370b9b4e6411d6b24e1fe5681e99    web/main/asgi.py
100644 blob 905914bb8d0aa8951948ba5f1a5167ea4627e35b    web/main/settings.py
100644 blob e1637d6a6898f22e07fcfaee419c67565997e788    web/main/urls.py
100644 blob e1c99c8fb064bb93d7710bdf80e80231fc95343a    web/main/wsgi.py
100644 blob fbda2b3121972503edd5072a6f077c6f744abe77    web/manage.py

Here's what's in commit fbef865:

$ git ls-tree -r HEAD
100644 blob b7c617c5cf9a85407c312d9e32da17de165cf81c    .gitignore
100644 blob 94ff3859fbf86be732f633254a110a860a8bbb65    requirements.txt
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    web/backend/main/__init__.py
100644 blob 63356bd12bf3370b9b4e6411d6b24e1fe5681e99    web/backend/main/asgi.py
100644 blob 905914bb8d0aa8951948ba5f1a5167ea4627e35b    web/backend/main/settings.py
100644 blob e1637d6a6898f22e07fcfaee419c67565997e788    web/backend/main/urls.py
100644 blob e1c99c8fb064bb93d7710bdf80e80231fc95343a    web/backend/main/wsgi.py
100644 blob fbda2b3121972503edd5072a6f077c6f744abe77    web/backend/manage.py
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    web/main/__init__.py
100644 blob 63356bd12bf3370b9b4e6411d6b24e1fe5681e99    web/main/asgi.py
100644 blob 905914bb8d0aa8951948ba5f1a5167ea4627e35b    web/main/settings.py
100644 blob e1637d6a6898f22e07fcfaee419c67565997e788    web/main/urls.py
100644 blob e1c99c8fb064bb93d7710bdf80e80231fc95343a    web/main/wsgi.py

So the reason you see both sets of files on GitHub is that your most recent commit contains both sets of files.

Keep in mind I moved [web/main to web/backup/main] so web/main should not exist anymore and it didn't exist in my local machine

You need to be careful here, because Git does not use the files in your working tree to build new commits. Git uses, instead, the files that are in Git's index. I believe I know precisely what happened, but before I suggest what I think happened, let's describe Git's index.

The index

Git's index is quite central to Git, yet it's largely invisible. If you look in a (non-bare) repository, you see a bunch of ordinary files, plus—if you look closely—a .git directory. The .git directory contains Git's databases and ancillary control files:

$ ls .git
branches        HEAD            info            packed-refs
config          hooks           logs            refs
description     index           objects

The big database (of all of Git's objects) is in .git/objects, and the smaller ones (e.g., branch and other names) are in .git/packed-refs and .git/refs and other files. That file named .git/index is where Git's index is being stored at the moment,1 but it's a binary file:

$ file .git/index
.git/index: Git index, version 2, 13 entries

What's really in that file is messy, and you're supposed to use the plumbing commands git ls-files and git update-index to manipulate it, if you ever need to go that low-level, but to describe Git's index—also known as the staging area, and sometimes as the cache—in as few words as possible, I like this phrase: The index holds your proposed next commit.

Git will fill in this index from a commit when you check that commit out, with git checkout or git switch. The actual contents are just index entries, which you can see with git ls-files --stage; each one has a mode, a hash ID (usually a blob hash ID), a staging number (usually zero), and a pathname such as web/main/__init__.py. Note that these file names contain embedded (forward) slashes: there are no folders or directories at this level, just files whose names have slashes in them.2

Git will also fill in your working tree during this checkout. That gives you usable copies of files. The blob objects in the index are read-only, Git-ified, compressed and de-duplicated copies of the files' data. Note in the git ls-tree -r output above that various blob hash IDs repeat: the files' data are stored only once, in each of those blobs, even though the files are "in" two commits. In the second commit, two files both have hash ID e69de29bb2d1d6434b8b29ae775ad8c2e48c5391. (This is in fact the hash ID of the empty file: every empty file, in every Git repository, has this particular hash ID.)

As you work in your working tree, you'll change the content of some files. You might even rename files, or shuffle the tree around. None of this affects Git's index, which holds independent copies of the files. Eventually, though, you'll want to put the new content into a new commit. Now you must run git add.

What git add does is read the working tree file, compress and de-duplicate it, and update Git's index. If the new content matches some existing blob, the hash IDs match, and Git just re-uses the existing blob. If the new content is truly new, Git stores that into the objects database and puts the new hash ID into the index. Either way the updated file is now ready to go into the new commit.

When you run git commit, Git simply packages up all the files that are in the index at that time, in the form they have at that time, to make the new commit. So whatever is in the index is now in the new commit.


1There's an environment variable, GIT_INDEX_FILE, that gives the path name of the principle index file. That index file can refer to other files, and many details are not promised any one way or another, but the Git plumbing commands will obey GIT_INDEX_FILE if you set it, which allows you to use alternative index files. The old git stash script, back when git stash was a script, used this, for instance.

2This is why Git is unable to store empty directories. Storing a gitlink in the index is possible, and the presence of a gitlink makes Git make an empty directory, so that's one way to cheat: a gitlink is half of a submodule, so storing half-a-submodule in the index lets you make Git make an empty directory. But you need the other half to make the submodule complete, to keep Git happy. See also How can I add a blank directory to a Git repository?


Removing or renaming files

If you remove or rename a file in your working tree, nothing happens in Git's index yet. You must tell Git to update its index.

The commands to do this include git add as before, but also git rm and git mv. The only one you actually need is git add because git add can remove a file from Git's index. Let's create a file in Git's index:

$ touch foo
$ git add foo
$ git ls-files --stage foo
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   foo

(there's that empty-file hash ID again). Now we'll remove the file from the working tree, and see that it remains in Git's index, but then we'll run git add and see what happens:

$ rm foo
$ git ls-files --stage foo
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   foo
$ git add foo
$ git ls-files --stage foo
$ 

So git add will "stage a deletion": if the file is gone from the working tree, git add removes the file from the index too.

So, why do git rm and git mv exist? They're there for convenience:

$ touch foo
$ git add foo
$ git rm foo
error: the following file has changes staged in the index:
    foo
(use --cached to keep the file, or -f to force removal)
$ git rm -f foo
rm 'foo'

The git rm command will remove the file from both the index and your working tree (and, as you can see, complain in some cases and do nothing, requiring -f or --force to do that). Or:

$ touch foo
$ git add foo
$ rm foo
$ git rm foo
rm 'foo'

If the file is already gone from the working tree, git rm acts a lot like git add, simply removing it from the index. Last, there's git rm --cached. This is where the old name, cache, turns up. This variant removes the file from Git's index / staging-area, without touching your working tree.

Similarly, git mv is there for convenience. Git does not store renames: each commit just has a full snapshot of every file. Git will, later, at git diff time, guess about renames: if the commit on the "left side" of a diff has some file X with some content, and no file Y, and the commit on the "right side" of a diff has some file Y with the same content X used to have, but has no file X, why then, probably file X got renamed to file Y. But to get there, we have to:

  1. copy the new-named file's content-hash (and mode) to the new name in the index;
  2. remove the old-named file's mode-and-hash from the index; and
  3. rename or otherwise update the file in the working tree, according to whatever this OS's file system rules are.

We can do all three of these at once with git mv. Or, we can rename the file with the OS's file-renamer, mv or rename or whatever it may be, and then run git add twice, or git rm and git add, or whatever. But running one git mv command is a little more convenient.

So that's what git rm and git mv are for: they're more convenient ways to achieve the result we want, which is to say, manipulate some file(s) in both Git's index and our working tree. We don't need either one. We can do the file-manipulation with regular OS commands, because files in our working tree are regular OS files. Then we can update Git's index with git add, which can do everything.

What I think you did

By now you probably know what you did, but here's what I think you did (assuming a Unix-like system's shell commands and maybe with some minor differences):

mkdir web/backend
mv web/main web/backend/main
git add web/backend

This told Git to copy all the renamed files into Git's index under their new names. But Git never took out the old names, so those index entries, with their mode 100644 and blob <whatever> and stage-zero and path-name, remained in Git's index. Then you ran:

git commit

and Git packaged up the index contents, complete with duplicate copies of each file.

Use git status to view the interesting parts of the index

Your working tree contains fewer files than your index, and if you run git status, Git will run two git diff operations:

  • One will compare the HEAD commit to Git's index. Since the HEAD commit was made from Git's index, these two will match exactly and this git diff will have nothing to say, so git status won't print anything from it.

  • The second git diff --name-status operation will compare what's in Git's index to what's in your working tree. Since the index holds files that have mysteriously3 vanished from your working tree, Git will call these files Deleted. With git status --short they'll show up here as status letter "D", and without --short they'll be listed as changes not staged for commit, with a bunch of files being "deleted".

The git status command is therefore very helpful:

  • First, it tells you which branch you're "on", i.e., when you make a new commit, which branch name will be updated to hold that new commit's new hash ID.

  • Then, if the current branch has an upstream set, it tells you about the current branch as compared against its upstream. (If not, it skips over this part.)

  • Now it runs the two diffs. For files that match the HEAD commit in the index, they're not very interesting, and it says nothing about those, but for files that are gone, or are new, or differ, those are interesting: it tells you about those, saying that they are staged for commit. For the second diff, for files that match in the index and working tree, those are dull so it says nothing, but when files don't match it tells you about the changed or deleted files.

There's something a little odd here. Files that are in your working tree, but aren't in Git's index, don't get listed in the above sections. Instead, this last group of files is segregated into its own section. These files are—by definition—your untracked files. Any file that exists in your working tree right now, but is not in Git's index right now, is an untracked file.

When working with Python code, we get byte-compiled .pyc or .pyo files (depending on optimization) whose name goes in __pycache__ or not (depending on Python version). These files clutter up your working tree, and if Git were to complain about them as being untracked files, that could be pretty noisy.4 So Git allows you to tell git status to shut up about certain untracked files. That's half of what .gitignore is about.

These untracked files should also stay untracked. That's the other half of what .gitignore is about: using an en-masse add operation, like git add ., will normally add every file to Git's index, including all the untracked ones. But the .pyc files shouldn't get added. By listing *.pyc in .gitignore, you tell git status to shut up about them, and git add not to add them.

There's one other thing you need to know about this though: once a file is in Git's index—however it got there: by git add or by being read out of some existing commit, that doesn't matter—once it's in there, the file is not ignored, even if it's listed in a .gitignore. You would have to remove it, e.g., git rm --cached, to get it out of Git's index to get it ignored again. Removing it from Git's index means it's now not in your next commit, once you make it, which means it won't be in Git's index when someone checks out that commit someday, too. This has consequences, especially when it's already in some other Git commit, where checking out that commit means it will be in Git's index. But worry about those later, once you have the rest of this index stuff down cold.


3No longer mysterious, now that you know.

4With __pycache__ and the way git status normally summarizes some untracked files, you get just one line here, which isn't that annoying. With the old Py2k scheme, you get one line per file, which is pretty annoying.

  • Related