Home > Software design >  Why 'git rm' rename file instead remove?
Why 'git rm' rename file instead remove?

Time:09-23

I am not an expert using git, so I want to understand why occurred that?. In my project I had two files (for the ask) main.c and clase.c that are completely different, regardless of that when I typed git rm main.c for removing that file the status showed me that git didn't delete main.c instead it rename that file.

The final result is ok, but why occurred that?

Thanks for help! If you want to checkup what I'm doing see my opengl learn path

CodePudding user response:

As Useless noted in a comment, Git doesn't actually have file renaming. Instead, it detects certain cases that look like file renaming, and then reports those as if the file was renamed.

Your image (side note: try not to use images at all, if possible; see How to Ask) shows the output from git status, with git status claiming that, in the proposed new commit, you have:

  • modified the .gitignore file
  • created a new file named PLANTILLAS/plantilla3D.c
  • renamed a file from main.c to clase.c
  • modified the notas.txt file
  • created a new file named profe.c

You have not in fact done any of that, unless maybe you did. What Git really knows is that, if you made a new commit right now, using the copies of files that are in Git's index right now, the new commit—whatever its hash ID might be; let's call it commit H, where H stands for hash ID—would differ from the current commit. The current commit has some other hash ID; let's call this one G, which does not stand for anything (unless we make up something like Gedankenexperiment, perhaps).

Commit G, which already exists and will continue to exist forever,1 has a snapshot: an archive of all files, like a tar or zip archive (though in a very different and rather Gitty format).. The snapshot that is in commit G now differs from the one in proposed future commit H.

The git diff command, when run and given two commit hash IDs, extracts those commits to somewhere temporary,2 and then compares them. The two extracted commit archives may have files that have the same names, or not. Those files with the same names may have the same contents, or not. The point of doing this diff is to come up with a reasonably minimal set of changes that, if applied to the extracted left-side commit, will produce the extracted right-side commit.

That means that if any file is completely unchanged—has the same name and the same content—Git can just skip right over it, without mentioning it at all. If the file has the same name but different content, the file is "modified" and Git should figure out what changed, as if playing a game of Spot the Difference.3

Sometimes, however, the left-side commit has a file, foo.txt or main.c or whatever, and that file just isn't there at all in the right-side commit. Git can claim that this file is Deleted and be done with it. By the same token, of course, sometimes the left-side commit doesn't have a file, and the right-side commit does have a file, such as clase.c. Git can claim that this file is newly Added and be done with it.

If Git did only this, and you renamed a file without changing it at all, Git would claim that the file was deleted from the left side, and a differently-named file was added to the right side. And in fact, if you run git diff --name-status --no-renames, this is exactly what git diff will claim, if you renamed a file. The recipe—"remove old file, create new one"—works just fine. It's just not very minimal.

Humans don't like this. So, when you allow it to do so, git diff—or anything that uses git diff—can check: Hey, here's a file that disappeared from the left. Here's a file that appeared on the right. Are the contents exactly 100% identical? This test is very fast because of the way Git de-duplicates file contents.

So, if you allow this, Git will call this a renamed file, pairing up the left-side name (main.c) and the right-side name (clase.c) and say the file was renamed. Now there are fewer Deleted files on the left and fewer Added files on the right.

Of course, your clase.c is not simply a renamed copy of main.c, with 100% identical C code contents. But: maybe you not only renamed the file, but also modified it slightly. If so, calling it a rename, and showing a diff to the contents, will produce a smaller and more-human-pleasing diff. So now, having paired up exact-same-content renamed files, Git checks for any and all possible remaining pairing-up operations with a deleted-on-left added-on-right pair. Since main.c is gone on the left, and clase.c and profe.c and PLANTILLAS/plantilla3D.c are all new on the right, the candidate pairings are:

  • main.c, clase.c
  • main.c, profe.c
  • main.c, PLANTILLAS/plantilla3D.c

Git now does a similarity computation on the contents of the possible pairings. How similar is main.c's content to the content of each of the three files?

This similarity computation is not documented, but I worked through it once (see Trying to understand `git diff` and `git mv` rename detection mechanism). Its output is a single number expressed as a percentage: any value that is at least 50% is considered a match by default, and the pairing with the highest similarity is taken, removing main.c from the rename detection queues. Since there are no longer any D-state files on the left to pair with the remaining A-state files on the right, the rename detection phase is now complete: main.c is most similar to clase.c, so Git deems these two to be paired as a rename.

All of this is a result of a diff between existing commit G and hypothetical new commit H. This diff is run with rename detection turned on and the minimum similarity threshold set at 50%.4 The proposed commit H consists of the files stored in Git's index. By running git add or git rm, you add, replace, or remove files from Git's index. This changes the proposed new commit, and git status's rename detector changes its behavior accordingly.


1Well, unless you work pretty hard to get rid of it, or all clones of this repo get destroyed by mistake. Once a commit exists, every Git that has it "likes to" give it to every other Git that doesn't have it, like some sort of virus. Hook one Git to another, and the sender will infect the receiver with every commit the sender can. Careful use of fetch and push refspecs can limit the infections to just those commits you wanted copied over.

2This temporary extraction is limited and done in memory, not on disk, to whatever extent possible. But except for going a lot faster, it acts like Git checked out each commit to a temporary tree somewhere.

3With --name-only or --name-status, Git can stop as soon as it determines the changed-ness, without having to read the file's content after all. That makes this go very fast. The git status command uses this mode as much as possible, so that it's fast. Rename detection, however, messes with this goal of not bothering to look inside files.

4These are the defaults. In older versions of Git, these defaults cannot be adjusted; in current Git, status.renames enables overall rename detection, and if that's not set, git status uses the diff.renames setting if that is set. Note that diff.renames itself wasn't consulted by git status in the past: the default was false before Git 2.9.

There is also a renameLimit value for these, which controls the depth of the rename queue used when matching up files. The file matching is expensive computationally, except for exact-match detection. Using the command line option --find-renames=number, you can set the minimum threshold level for considering two similar files "renamed", as well; this option was always available in git diff, but became available in git status in Git 2.18.

CodePudding user response:

i think you're wrong about that, the command > git rm file_name delete the file from your staging area as well as your git local repo

  •  Tags:  
  • git
  • Related