Home > Blockchain >  How can I track a subset of files from a remote repository?
How can I track a subset of files from a remote repository?

Time:10-05

I'm trying to solve the following situation: I'd like to include a (not owned, public) project into mine, resizing a little bit the original file tree by removing redundant and/or not-needed files, and only leaving the bare minimum, BUT also retaining the possibility of tracking modifications to the original files.

I've tried making my own copy of said repository, adding the original as remote, but that only works up until I start deleting files from my own copy, at which point trying to fetch the remote changes fails as I'm missing files.

Is that normal? Did I mess something up in the process, and is there a more elegant way to accomplish this?

CodePudding user response:

Because of the files missing from your copy of the remote repository, a git pull will fail with a "divergent branches" error as soon as any later commits exist on the remote. However, in your case, a git rebase should do exactly what you want.

In simple terms, a rebase will just reapply your commits onto a selected commit of the original repository (typically origin/main). You will end up with a copy of the current origin/main minus the files you chose to remove. Check the git-rebase documentation for details.

Here's an example:

# Clone a repository and remove some files from my local copy
git clone https://github.com/some_repo
cd some_repo
git rm file_a file_b
git commit -m "remove unneeded files"
git rm file_c
git commit -m "remove file_c"

# At a later time, bring in new commits from the
# remote repository and rebase my commits (removals)
# atop the updated content
git rebase origin/main

CodePudding user response:

The short answer is that you can't do it this way: Git is based on commits, not files, and every commit holds a full snapshot of every file. What this implies is that if you make a new commit in which some file does not exist, the difference between the old commit and the new commit is that the file is deleted. Any attempt to use a later commit from the other repository—which requires some kind of merge work, regardless of whether that's a cherry-pick from a rebase, a manual cherry-pick, or a git merge operation: all of these perform the merge-as-a-verb action—will consider your deletion of the file as just that: deletion of the file.

That's not ultimately fatal (because you can resolve the modify/delete conflict whichever way you need to), but it's a bad plan in general.

In any case, a repository is not allowed to contain another repository, so if you have your own repository and you'd like to clone and make use of some other repository as a subset, you're either faced with:

  • incorporating all of their files directly into your own repository, after which your commits and their commits are unrelated and hence Git can't help much; or
  • incorporating all your files into their repository, which is likely to be "upside down" from the way you want things to be; or
  • using submodules, which have their own issues.

In general, submodules—while painful (people call them sob-modules for a reason)—tend to be the favored approach here. A lot of Google software, for instance, uses submodules this way.

  • Related