I have a "big" file containing two python classes, and I want to split this file in two, one file for each class.
One solution would consist of copying the content of the classes in two new files and then deleting the first one, but this would induce a huge delta in history, and won't keep track of the previous states of the original file (as it has been deleted)
I want to split the file in two such that the deltas between the two states have only two lines, see the linked picture, I hope I made it clear enough...
Is it possible ?
NB : one intermediary way would consist of cutting and pasting only the second part of the file, and then using git mv
so we would keep track of half of the file, but still we would have a "huge" delta in the history, which I'm trying not to have.
CodePudding user response:
The thing to understand here is that Git does not store deltas or history. There is no "delta" that can be big or small, and there is no "history" to "keep". Git calculates an account / presentation of the history when you do, say, a git log
, and it calculates the deltas when you do, say, a git diff
(or a git log
that shows diffs as patches).
You cannot really manipulate or second-guess how this works. When you do a git log
, you can tweak how closely two files need to be similar in order to be considered "the same file" if one has vanished and the other has appeared (because you renamed the file). But if you are hoping that somehow both files in the split will magically "lead back" to the one original file in the previous commit, give up; that's not how Git thinks.
And you should not worry about the "size" of "deltas" because there are no deltas. Every commit is a snapshot of all your files at that moment. There's no point trying to second-guess that. Just let Git do its thing.
CodePudding user response:
I want to split the file in two such that the deltas between the two states have only two lines, see the linked picture,
Deltas for presentation depend on the audience and purpose and are basically guaranteed suboptimal for storage compression. Git's internal deltas are done for storage compression. They're not done against just some previous version, they're done against as much history as Git's been configured to inspect; at the factory defaults, for most projects, that's "all of it". Nobody but the devs ever wants or needs to see those.
If you see a presentation delta you don't like, make it go away.
For instance, if you know your code was extracted from a larger source, turn Git's copy/rename sensitivity way up and have it just trace the current hunk,
git log -p -C30 -L1,`wc -l<myfile.py`:myfile.py`
Simplest way to see what produced your current source in the history you're describing is git blame -C30
, that will show you where all your current source was added, and with a decent programmer's editor you can step back through the versions with like two keystrokes.
I don't think anyone's yet implemented a summarizer that will reduce a whole-file add to just say "all of it", but when I do
git log -p -C30 -L1,`wc -l <split1`:split1
on a test blob, ~500 paragraphs of lorem ipsum in testing
split into 200 paragraphs in split1
and the rest in split2
, it shows me just that hunk added in testing
.