Home > other >  Why Git uploads unchanged files again?
Why Git uploads unchanged files again?

Time:01-28

I have larger files in my last commit that already is pushed to my feature branch. When changing the commit message (or renaming/moving a file inside the repository) using amend-commit and force-pushing it(*), it takes a longer time to upload the full commit with file contents. So Git does not treat file contents independent from meta information as file name, commit message?

(*) Yes, I know that a) I should not store large binary files in Git and b) should not force-push my own branch - but this is not the question.

CodePudding user response:

Git's method of figuring out what objects some other repository needs is (deliberately) imperfect: rather than exhaustively enumerating all potential objects, it simply enumerates a subset of commits. That is, your Git, doing git push, tells their Git (over at GitHub or wherever): "I have commit a123456" and they say "Oooh I don't have that one, gimme". This obliges your Git to offer the parent of a123456; they will check to see if they have the parent, and either ask for it or send a "no thanks already have it" kind of answer.

Now, if you are using "normal" commit sequencing:

...--F--G--H   <-- your-main
...--F--G   <-- their-main

you offer commit H and they say they want it, and then you offer commit G and they say that they have it, and your Git can use this to figure out that they have all objects that are contained in commit G and all earlier commits.1 So only files whose content is completely new in H are included in the pack file your Git builds and sends.

If you then build a new commit atop that:

...--F--G--H--I   <-- your-main

and offer to send I, they'll ask for that; your Git now offers to send H again, and they say no thanks, already have that and your Git knows not to re-send files (objects) whose contents are in H. (Your Git also gets to delta-compress against such objects.) But when you use git commit --amend, you build a graph like this:

          H
         /
...--F--G--I   <-- your-main

You offer to send them I and they say yes, then you offer to send them commit G (not H!) and they say no thanks, already have that one. You never notice that they have commit H, so when your Git builds a pack file, it includes any objects that are needed for commit I but not present in G-or-earlier.

Your large file that went over with H earlier is indeed in their repository, but your Git does not realize that because your Git does not take the shared H into advisement. It could—and perhaps should—since your Git could get a list of all their branch names and commit hash IDs, and this would help in both this specific case, and in the depth-1 shallow-clone case that is probably more frequent.2

Doing a perfect job requires a lot more work because they may have commits your Git has never seen:

       H   [abandoned]
      /
...--G--I   <-- your-main

...--G--H--K--L   <-- their-main

If they tell you that they have commit L, that tells your Git nothing useful, because your Git doesn't have commit L. If they told you they had K too, that still wouldn't help; they'd have to enumerate back to H so that you and they hit upon a shared commit. Alternatively, they could enumerate all tree, tag, and blob objects, but this might make the have/want part of the protocol excessively expensive. The common case, for a successful push, is that you are just adding on to their commits.


1This assumes that there are no shallow repositories involved here. With shallow repositories, the calculations are augmented a bit with "stops" at the shallow grafts during object enumeration. The theory is pretty obvious; the practical implementation details are ... less obvious and I have not delved into Git's actual implementation.

2This case is why cloning with depth 2, rather than 1, before adding and pushing a new commit tends to be so much more efficient. You'll get one extra commit and some other extra objects, but your git push will offer the parent, they'll say that they have it, and your git push will now be able to omit the objects they have. As it is, with depth 1, your git push assumes they have no objects at all and therefore sends the entire tree.

  •  Tags:  
  • Related