Home > Software engineering >  Is there a way to move git file history to another file?
Is there a way to move git file history to another file?

Time:06-22

I'm making changes in my repostiory.

So I have file index.d.ts, And I want to move all of its content to another file file1.d.ts and I want to move all of index.d.ts commits history to the new file. But I still don't want to delete 'index.d.ts' from the repository.

I tried to use git mv index.d.ts file1.d.ts but once I recreate index.d.ts with the new content, then the history is automatically move back to index.d.ts and removed from file1.d.ts

Is there a way to move current index.d.ts file history to file file1.d.ts ?

CodePudding user response:

When you want to see the history of a file after it has been renamed, you can use

git log --follow <filename>

It will keep giving you the history of the file through the rename commits all the way back to when the file was first created with its original name.

So use

git mv index.d.ts file1.d.ts

commit the results, and then later when you want to see the history, use

git log --follow file1.d.ts

Ideally, make a commit with just the git mv operation, and then make a separate commit for changes you want to do in the file after renaming it. --follow can still find a rename if you make some changes to the file, but if you change it too much it might fail to recognize that it was a rename, and not an unrelated new file.

CodePudding user response:

See joanis' answer for a practical solution: rename the file, commit, and then add a later, separate commit in which you add a new file that re-uses the old name. This enables git log --follow to follow the rename.

What to understand about how and why this works

You asked about "moving history to another file", but Git doesn't have "file history" in the first place. What Git has are commits. The commits are the history.

Each commit, in Git, holds a full snapshot of every file. (Each commit also holds some metadata, but for this answer we'll just concentrate on the snapshots.) The files in these snapshots are compressed, Git-ified, and frozen for all time and—this is important—have their content de-duplicated. The de-duplication means that if you have any pair of commits in a row—Git calls these "parent" and "child" commits, but for thinking about it for the moment, just think of them as "before" and "after", or "old" and "new"—and in the old commit there's some file that reads "I am a file", and in the new commit there's some file—with any name at all—that also reads "I am a file", these two files are duplicates of each other, and are de-duplicated.

Git can find a de-duplicated file with extreme ease. Git can find similar files (files that are not quite the same, but are mostly the same) but it's much harder than finding files that match up because they were already de-duplicated. Git's main purpose with this de-duplication is that when you made the new commit, you probably only changed one or two files, out of dozens, or hundreds, or millions, or whatever, in the old commit. So almost all files in a new commit are duplicates of the old files in the old commit. By de-duplicating the contents, Git saves lots of space: it only really has to save any modified file, even though both commits "contain" all the files.

Now, as I just said above, the history in a repository is nothing more or less than the commits in the repository. When you run git log, Git finds the current commit—usually the latest commit on some branch, which you selected by running git checkout mybranch or git switch mybranch—and then works backwards, one commit at a time, showing you each commit's log message. That's because the commits are the history: if we start at the last one and show its log message, then hop back one to its parent and show that log message, then hop back one and show that log message, then ... well, repeat until we run out of commits: we've now seen the history.1

Sometimes, though, we want to know: In which commits did I change file F? So instead of just:

git log

we run:

git log -- F

The double hyphen here is required if F is also a branch name for instance, to tell git log that we mean file F and not branch F; when F isn't mis-interpreted, we can leave out the --, but it's a good idea to include it every time.

This still walks through all2 the commits, backwards, one commit at a time. But this time Git makes use of the "file is exactly the same - contents got de-duplicated" thing. Let's say that in the current commit, file F exists and is exactly the same as it is in the previous (parent) commit. Git can load up the two commits' snapshots, immediately see that the file was de-duplicated, and not bother to print anything out about the current commit.

Git then moves back one hop to the previous commit. Git finds its parent, loads up the two snapshots, and checks to see if the file is the same (de-duplicated) or different (not-de-duplicated). If it's the same, Git doesn't mention this commit. If it's different, Git does mention this commit, in the usual way.

We then repeat, forever, one commit at a time, checking to see if this commit and the previous commit both have the same version of the file (or both omit the file entirely). If so, the commit is uninteresting and Git doesn't print it. If not—if the file has changed, or sprung into existence, or been deleted, due to this commit—then this commit is interesting and Git does print it.

The end result of this is that we "see" a "history of file F". But there isn't a history of file F: there are just commits, and those are the history. What we've seen instead of a "history of F" is "all history, except for commits where F didn't change". Is that the same thing? Well, maybe: enter --follow.

Suppose we're at some pair of commits <old, new>. Git will extract the two snapshots and see if the file named F exists in both, or in neither. Suppose further that F exists in new but does not exist in old. Git can now—because of --follow—look for some file that does exist in old and doesn't exist in new. Let's say there are five such files. These are our candidate files. We now check to see if any of these five files are "similar enough" to the copy that's in new.

This check goes really fast for "absolutely identical", so we have Git do that check first. If any of the five files are 100% identical and hence got de-duplicated, why, the file whose name is oldF in old must have been renamed to F in new. So:

  • git log prints commit new because the file changed (it was going to do this anyway), and
  • git log --follow changes the name it is looking for. It's no longer looking for file F, it's now looking for oldF.

If the 100% identical thing doesn't work out, git log uses the slower, "is it close enough" match. You can set the matching threshold: the default is to require at least a 50% match. If more than one file meets the threshold, Git will take the highest matching file and call that the "old file" name. If no file meets the threshold, Git will assume that file F is deleted, and continue looking for the name F to see if it gets re-created in earlier commits.

For all this to work, the old file name has to vanish between some pair of commits, and you have to start git log --follow with the new file name that appears in that same old/new commit-pair. Committing a rename-only operation will guarantee that (a) this happens, and (b) the match is a 100% match and therefore goes fast. So making a rename-the-file commit as a separate commit makes git log --follow work pretty well.

(There are flaws in git log --follow, though. See footnote 2, and note that as currently implemented, --follow can only handle one file name. You must know the new name: --follow does not work with git log --reverse.)


1More precisely, we've seen the reachable commits, starting from the current commit and working backwards. For (much) more on reachability—which is another key concept in Git—see Think Like (a) Git.

2This isn't quite true: git log -- <path> turns on what git log calls History Simplification. This makes Git drop certain history paths. When using --follow this is less of a loss than it might seem. For (much) more on this, see, e.g., git log does not return the history of a file correctly, Git log (--follow) not working to show history beyond renames, and other answers that mention history simplification.

  •  Tags:  
  • git
  • Related