So I was working with git remote from my local machine to Github and when I made my first commit, the directory was web/main
and then I pushed it into github. After that I changed my mind so I moved the main
folder into a new folder called backend
and the directory changed to web/backend/main
. Keep in mind I moved it so web/main
should not exist anymore and it didn't exist in my local machine, but in github the folder remains there. is this intentional?
git status:
$ git status
On branch master
Your branch is up to date with 'origin/master'.
nothing to commit, working tree clean
git push:
$ git push
Everything up-to-date
EDIT:
I should probably point this out earlier but there's a single python file manage.py
aside of the folder main
in the web/
in the first commit
and also IMSoP point out about manage.py
being renamed while other files we're being copied in github. So i checked the first commit on my local machine and weirdly enough I couldn't find main
folder which is weird because in the first commit of my github it was there.
Thank you in advance
CodePudding user response:
git doesn't store directories. It only implicitly creates them if you extract files that have such a path. I suppose gethub's web view of a branch works like your normal repository. Updating it will remove the files but not any resulting empty directories. Your local directory was not removed by git but by you and then you committed the changes. The change shows a removed file and an added file.
CodePudding user response:
Cloning your repository (https://github.com/happyprogrammer-code/gittest-django-simpleapp), I find that there are two commits in it:
$ git log --all --decorate --oneline --graph
* fbef865 (HEAD -> master, origin/master, origin/HEAD) organizing the files in to backend file
* 018aed1 first commit
Here's what's in commit 018aed1
:
$ git ls-tree -r HEAD^
100644 blob b7c617c5cf9a85407c312d9e32da17de165cf81c .gitignore
100644 blob 94ff3859fbf86be732f633254a110a860a8bbb65 requirements.txt
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 web/main/__init__.py
100644 blob 63356bd12bf3370b9b4e6411d6b24e1fe5681e99 web/main/asgi.py
100644 blob 905914bb8d0aa8951948ba5f1a5167ea4627e35b web/main/settings.py
100644 blob e1637d6a6898f22e07fcfaee419c67565997e788 web/main/urls.py
100644 blob e1c99c8fb064bb93d7710bdf80e80231fc95343a web/main/wsgi.py
100644 blob fbda2b3121972503edd5072a6f077c6f744abe77 web/manage.py
Here's what's in commit fbef865
:
$ git ls-tree -r HEAD
100644 blob b7c617c5cf9a85407c312d9e32da17de165cf81c .gitignore
100644 blob 94ff3859fbf86be732f633254a110a860a8bbb65 requirements.txt
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 web/backend/main/__init__.py
100644 blob 63356bd12bf3370b9b4e6411d6b24e1fe5681e99 web/backend/main/asgi.py
100644 blob 905914bb8d0aa8951948ba5f1a5167ea4627e35b web/backend/main/settings.py
100644 blob e1637d6a6898f22e07fcfaee419c67565997e788 web/backend/main/urls.py
100644 blob e1c99c8fb064bb93d7710bdf80e80231fc95343a web/backend/main/wsgi.py
100644 blob fbda2b3121972503edd5072a6f077c6f744abe77 web/backend/manage.py
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 web/main/__init__.py
100644 blob 63356bd12bf3370b9b4e6411d6b24e1fe5681e99 web/main/asgi.py
100644 blob 905914bb8d0aa8951948ba5f1a5167ea4627e35b web/main/settings.py
100644 blob e1637d6a6898f22e07fcfaee419c67565997e788 web/main/urls.py
100644 blob e1c99c8fb064bb93d7710bdf80e80231fc95343a web/main/wsgi.py
So the reason you see both sets of files on GitHub is that your most recent commit contains both sets of files.
Keep in mind I moved [
web/main
toweb/backup/main
] soweb/main
should not exist anymore and it didn't exist in my local machine
You need to be careful here, because Git does not use the files in your working tree to build new commits. Git uses, instead, the files that are in Git's index. I believe I know precisely what happened, but before I suggest what I think happened, let's describe Git's index.
The index
Git's index is quite central to Git, yet it's largely invisible. If you look in a (non-bare) repository, you see a bunch of ordinary files, plus—if you look closely—a .git
directory. The .git
directory contains Git's databases and ancillary control files:
$ ls .git
branches HEAD info packed-refs
config hooks logs refs
description index objects
The big database (of all of Git's objects) is in .git/objects
, and the smaller ones (e.g., branch and other names) are in .git/packed-refs
and .git/refs
and other files. That file named .git/index
is where Git's index is being stored at the moment,1 but it's a binary file:
$ file .git/index
.git/index: Git index, version 2, 13 entries
What's really in that file is messy, and you're supposed to use the plumbing commands git ls-files
and git update-index
to manipulate it, if you ever need to go that low-level, but to describe Git's index—also known as the staging area, and sometimes as the cache—in as few words as possible, I like this phrase: The index holds your proposed next commit.
Git will fill in this index from a commit when you check that commit out, with git checkout
or git switch
. The actual contents are just index entries, which you can see with git ls-files --stage
; each one has a mode, a hash ID (usually a blob hash ID), a staging number (usually zero), and a pathname such as web/main/__init__.py
. Note that these file names contain embedded (forward) slashes: there are no folders or directories at this level, just files whose names have slashes in them.2
Git will also fill in your working tree during this checkout. That gives you usable copies of files. The blob objects in the index are read-only, Git-ified, compressed and de-duplicated copies of the files' data. Note in the git ls-tree -r
output above that various blob hash IDs repeat: the files' data are stored only once, in each of those blobs, even though the files are "in" two commits. In the second commit, two files both have hash ID e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
. (This is in fact the hash ID of the empty file: every empty file, in every Git repository, has this particular hash ID.)
As you work in your working tree, you'll change the content of some files. You might even rename files, or shuffle the tree around. None of this affects Git's index, which holds independent copies of the files. Eventually, though, you'll want to put the new content into a new commit. Now you must run git add
.
What git add
does is read the working tree file, compress and de-duplicate it, and update Git's index. If the new content matches some existing blob, the hash IDs match, and Git just re-uses the existing blob. If the new content is truly new, Git stores that into the objects database and puts the new hash ID into the index. Either way the updated file is now ready to go into the new commit.
When you run git commit
, Git simply packages up all the files that are in the index at that time, in the form they have at that time, to make the new commit. So whatever is in the index is now in the new commit.
1There's an environment variable, GIT_INDEX_FILE
, that gives the path name of the principle index file. That index file can refer to other files, and many details are not promised any one way or another, but the Git plumbing commands will obey GIT_INDEX_FILE
if you set it, which allows you to use alternative index files. The old git stash
script, back when git stash
was a script, used this, for instance.
2This is why Git is unable to store empty directories. Storing a gitlink in the index is possible, and the presence of a gitlink makes Git make an empty directory, so that's one way to cheat: a gitlink is half of a submodule, so storing half-a-submodule in the index lets you make Git make an empty directory. But you need the other half to make the submodule complete, to keep Git happy. See also How can I add a blank directory to a Git repository?
Removing or renaming files
If you remove or rename a file in your working tree, nothing happens in Git's index yet. You must tell Git to update its index.
The commands to do this include git add
as before, but also git rm
and git mv
. The only one you actually need is git add
because git add
can remove a file from Git's index. Let's create a file in Git's index:
$ touch foo
$ git add foo
$ git ls-files --stage foo
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 foo
(there's that empty-file hash ID again). Now we'll remove the file from the working tree, and see that it remains in Git's index, but then we'll run git add
and see what happens:
$ rm foo
$ git ls-files --stage foo
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0 foo
$ git add foo
$ git ls-files --stage foo
$
So git add
will "stage a deletion": if the file is gone from the working tree, git add
removes the file from the index too.
So, why do git rm
and git mv
exist? They're there for convenience:
$ touch foo
$ git add foo
$ git rm foo
error: the following file has changes staged in the index:
foo
(use --cached to keep the file, or -f to force removal)
$ git rm -f foo
rm 'foo'
The git rm
command will remove the file from both the index and your working tree (and, as you can see, complain in some cases and do nothing, requiring -f
or --force
to do that). Or:
$ touch foo
$ git add foo
$ rm foo
$ git rm foo
rm 'foo'
If the file is already gone from the working tree, git rm
acts a lot like git add
, simply removing it from the index. Last, there's git rm --cached
. This is where the old name, cache, turns up. This variant removes the file from Git's index / staging-area, without touching your working tree.
Similarly, git mv
is there for convenience. Git does not store renames: each commit just has a full snapshot of every file. Git will, later, at git diff
time, guess about renames: if the commit on the "left side" of a diff has some file X with some content, and no file Y, and the commit on the "right side" of a diff has some file Y with the same content X used to have, but has no file X, why then, probably file X got renamed to file Y. But to get there, we have to:
- copy the new-named file's content-hash (and mode) to the new name in the index;
- remove the old-named file's mode-and-hash from the index; and
- rename or otherwise update the file in the working tree, according to whatever this OS's file system rules are.
We can do all three of these at once with git mv
. Or, we can rename the file with the OS's file-renamer, mv
or rename
or whatever it may be, and then run git add
twice, or git rm
and git add
, or whatever. But running one git mv
command is a little more convenient.
So that's what git rm
and git mv
are for: they're more convenient ways to achieve the result we want, which is to say, manipulate some file(s) in both Git's index and our working tree. We don't need either one. We can do the file-manipulation with regular OS commands, because files in our working tree are regular OS files. Then we can update Git's index with git add
, which can do everything.
What I think you did
By now you probably know what you did, but here's what I think you did (assuming a Unix-like system's shell commands and maybe with some minor differences):
mkdir web/backend
mv web/main web/backend/main
git add web/backend
This told Git to copy all the renamed files into Git's index under their new names. But Git never took out the old names, so those index entries, with their mode 100644
and blob <whatever>
and stage-zero and path-name, remained in Git's index. Then you ran:
git commit
and Git packaged up the index contents, complete with duplicate copies of each file.
Use git status
to view the interesting parts of the index
Your working tree contains fewer files than your index, and if you run git status
, Git will run two git diff
operations:
One will compare the
HEAD
commit to Git's index. Since theHEAD
commit was made from Git's index, these two will match exactly and thisgit diff
will have nothing to say, sogit status
won't print anything from it.The second
git diff --name-status
operation will compare what's in Git's index to what's in your working tree. Since the index holds files that have mysteriously3 vanished from your working tree, Git will call these filesD
eleted. Withgit status --short
they'll show up here as status letter "D", and without--short
they'll be listed as changes not staged for commit, with a bunch of files being "deleted".
The git status
command is therefore very helpful:
First, it tells you which branch you're "on", i.e., when you make a new commit, which branch name will be updated to hold that new commit's new hash ID.
Then, if the current branch has an upstream set, it tells you about the current branch as compared against its upstream. (If not, it skips over this part.)
Now it runs the two diffs. For files that match the
HEAD
commit in the index, they're not very interesting, and it says nothing about those, but for files that are gone, or are new, or differ, those are interesting: it tells you about those, saying that they are staged for commit. For the second diff, for files that match in the index and working tree, those are dull so it says nothing, but when files don't match it tells you about the changed or deleted files.
There's something a little odd here. Files that are in your working tree, but aren't in Git's index, don't get listed in the above sections. Instead, this last group of files is segregated into its own section. These files are—by definition—your untracked files. Any file that exists in your working tree right now, but is not in Git's index right now, is an untracked file.
When working with Python code, we get byte-compiled .pyc
or .pyo
files (depending on optimization) whose name goes in __pycache__
or not (depending on Python version). These files clutter up your working tree, and if Git were to complain about them as being untracked files, that could be pretty noisy.4 So Git allows you to tell git status
to shut up about certain untracked files. That's half of what .gitignore
is about.
These untracked files should also stay untracked. That's the other half of what .gitignore
is about: using an en-masse add operation, like git add .
, will normally add every file to Git's index, including all the untracked ones. But the .pyc
files shouldn't get added. By listing *.pyc
in .gitignore
, you tell git status
to shut up about them, and git add
not to add them.
There's one other thing you need to know about this though: once a file is in Git's index—however it got there: by git add
or by being read out of some existing commit, that doesn't matter—once it's in there, the file is not ignored, even if it's listed in a .gitignore
. You would have to remove it, e.g., git rm --cached
, to get it out of Git's index to get it ignored again. Removing it from Git's index means it's now not in your next commit, once you make it, which means it won't be in Git's index when someone checks out that commit someday, too. This has consequences, especially when it's already in some other Git commit, where checking out that commit means it will be in Git's index. But worry about those later, once you have the rest of this index stuff down cold.
3No longer mysterious, now that you know.
4With __pycache__
and the way git status
normally summarizes some untracked files, you get just one line here, which isn't that annoying. With the old Py2k scheme, you get one line per file, which is pretty annoying.