I have to duplicate a file (same name, same content) that is handled by git lfs twice inside a repo in different directories for our 3rd party software to work properly. This is a limitation to the 3rd party software that I have to live with.
- ./directory1/large_file.crg
- ./directory2/large_file.crg
git lfs intelligently identifies that they are the same file, and only downloads it once to a local repo. The other files simply get a pointer to the location of the real file. This is causing problems with the 3rd party software as it can not read the pointer.
Is there any way I can force Git LFS to duplicate the file instead of point to it? Or can someone please point to me where this behavior is documented so I can explain it to colleagues?
CodePudding user response:
TL;DR
Don't use symlinks in repos that are meant to be used on both Windows and Linux/Mac.
The problem with symlinks
Based on the comments under the question, it turns out the repo did not have two copies of the big file, but rather one copy, and a Linux-style symlink pointing to it.
Git fully supports symlinks - you can add them, modify them, check them in, check them back out. As long as you're staying in Linux, everything will work fine.
But Windows does not support Linux-style symlinks, so what you get is broken, as you describe in the comments under the question.
The solution: get rid of symlinks is cross-platform repos
It might not seem like a nice solution, but if your repo is meant to be used on Windows and *nix OSs, avoid symlinks.
You will still save some space, at least on the Git server, and in Git LFS storage too, because Git is smart enough to reuse the same blob when there are two identical files in a repo. In fact, it's unable to do anything else, because the sha1 hash of the blob is what is used to store and retrieve the blob! You'll just have two copies of it in each sandbox that uses that repo.