We have a file in git, and it's LARGE about 1GB. However, this file is generated in-engine and takes hours of full CPU usage to generate. Thus, we would like to push this up to git and have it maintained. However, it does not need to be pushed very often. In fact we almost never want to push this file unless we are building for distribution.
A few times now, someone has pushed this file up accidentally and it uses lots of LFS bandwidth and storage. I was hoping to be able to add a git pre-commit hook to make sure that this file was added to the commit by force, or just some way to double check the addition of the file.
I've never made a hook script, is there any way that I accomplish this functionality with a pre-commit hook? I am struggling to find regular documentation on the creation of these scripts and what attributes I have access to. Also, if there is an easier way to accomplish what I want, that would be super appreciated!
CodePudding user response:
You can't really do this with a pre-commit hook. Remember that every Git commit contains every file (that it contains), and a git checkout
or git switch
operation that extracts some commit with hash H
first removes any files that are currently in Git's index and your working tree because of your current check-out of commit G
(some other hash ID than H
) in which some file does exist, when that file doesn't exist in commit H
.
Hence, either the large file is in every commit, or it's in no commit: there's no real middle ground. Git does optimize thing by literally re-using identical-content files, so if 100 commits contain version 1 of the large file, and the next 100 commits contain version 2 of the large file, then there are really only two copies of the large file in the repository, shared across 100 commits each.
A git push
that sends a new commit that re-uses the existing large file won't re-send the large file, since the receiving Git will announce that it has existing commit E
when your Git goes to send new commit N
, and your Git will then see that the large-file copy in N
matches the large-file copy in E
and therefore doesn't need to send the file with the new commit N
. So it sounds like what you're really asking for—though you didn't realize it—is a way to detect whether git push
will send a new and different large file, that the receiving Git currently lacks.
You can't detect this perfectly but you can get a decent stab at it easily in a pre-push hook (rather than a pre-commit hook). Git runs the pre-push hook after your Git contacts their Git and finds out what hash ID goes with whatever existing refs they have that you're asking them to update. The pre-push hook can read, from its standard input, the complete list of refs and hash IDs that your Git is proposing to send to their Git, along with the hash IDs they have under whatever name your Git is proposing that they update.
Using this information, you can see that, say, your Git is proposing to send to their Git the commit(s) that are reachable from your branch tip develop
with hash ID a123456...
. At this time, they have, as their branch tip of their develop
, a commit with hash ID b976543...
. Looking up this commit in your own repository, you find that b976543
does contain the large file, with content whose hash ID is deadcab...
, and that of all the commits in b976543..a1234567
in your own repository, they all contain the large file with the same hash ID: in this case you'll not be sending a new copy of the large file. Or, perhaps you don't have b976543
, or b976543
has a large file with deadcab...
as its blob hash but a123456...
will send the large file with contents that hash to feeddad...
instead. So this time your Git probably will send the large file, and you can force the user to, e.g., set an environment variable I_REALLY_MEANT_TO_DO_THAT=yes
for instance.
This whole system is very klunky, and as far as I know, nobody does it like this. Instead, the large file is considered a "build asset" and stored outside the repository, along with information that tells you whether this particular copy of the build asset is right for your build. You first inspect the verification information: if it's right, you use the pre-built asset, and if not, you spend the time to build it (and then optionally deliver it to the asset server as well).