Home > Blockchain >  git: Keep files/folder in upstream repo, but don't include in source
git: Keep files/folder in upstream repo, but don't include in source

Time:05-09

I wanted to include a folder called screenshots/ in my git repo (mainly to support the README) to show... well screenshots of the app I created, so I did. However, the next time I tried to push a new commit to the repo, I got an error urging me to pull down the changes made in the upstream repo.

Obviously, I do not want a bunch of screenshots in my project's source. Is there a way to "ignore" certain directories in the upstream?

I know there's .gitignore, but that's for keeping certain folders in your local project's source, but don't want to version it. What I want is sort of the opposite of a .gitignore.

CodePudding user response:

It helps to think about what Git stores. Git doesn't store folders in the first place; instead, Git stores commits, and commits in turn store (in part) files, as a sort of archive, frozen in time. The files have long names that may include slashes, such as path/to/file.ext. Your OS may—almost certainly "does"—require that this file be extracted as a file named file.ext that lives in a folder named to that lives in a folder named path. Your OS may call this path\to\file.ext rather than path/to/file.ext. But to Git, it's just a file named path/to/file.ext, complete with the (forward) slashes in the file name.1

Anyway, the real point of the above is to note that the unit of storage, in a Git repository, is the commit. Each commit:

  • Is numbered, with a unique, very large, random-looking number normally expressed in hexadecimal.
  • Is entirely read-only. This is necessary for the numbering technique to work, and it means that no part of any commit can ever be changed once the commit is made. (The git commit --amend command flag, for instance, is a bit of a lie: it just makes a new-and-improved replacement commit, shoving the old commit aside.)
  • Stores two things:
    • Each commit holds a full snapshot of every file, as of the form it had when you (or whoever) made the commit. This stored form of each file can occupy very little disk space, sometimes literally no space at all for this particular commit, because the files are de-duplicated within and across commits. But only Git can read these files, and literally nothing, not even Git itself, can overwrite them.
    • Each commit stores some metadata, or information about the commit itself. This includes things like the name and email address of the person who made the commit.

The stored snapshot in any given commit is extracted to a work area at the time you run git checkout or git switch. This command means: Rip out all the extracted files from any previously checked-out commit, and put in place all the files from the newly selected commit. There are a lot of finicky details here that I'm glossing right over on purpose, but that's the basic working of git checkout or git switch (whichever command you prefer).

Having checked out some commit, you can now work on / with the files. The files that you work on / with are not in Git. They are in your working tree or work-tree, stored as ordinary files, in folders as required by your OS. You can do anything you like with these files because they are ordinary files, not under Git's control in any way. When you're done doing things with those files, you must run git add:2 this tells Git to scan through those files and look for updates, which Git will copy into its index if / as needed, to be ready for the next git commit. Then you run git commit, and Git turns everything that's in its index—all the originally checked out files, except as updated by your git added files—into a new snapshot for the new commit.

What this means is simple enough:

  • either the screenshot/* files are in a commit, or they aren't;
  • if they're in the commit you check out, they'll be in the next commit you make, and if not, they won't—unless you use git add or git rm to change the situation.

That, in turn, means that the initial answer to your question is: no, you can't do that. But....


1Technically this is only true of the files as they appear in Git's index. Once an index's set-of-files gets frozen into a new commit, Git stores each individual path-name component in a tree object, and it takes multiple tree objects to assemble the full path name. But Git doesn't have you work with tree objects individually: instead, it copies a tree into the index, and makes you work with the index and then commit what's in the index. Since there are no folders in the index, there cannot be folders in a commit: Even though the internal tree structure would theoretically allow it, Git cannot store an empty folder. There is however a submodule trick, which requires using two Git repositories, that can fake it.


But wait, there's more! Or less!

Git does offer what it calls a sparse checkout. With sparse checkout, you tell Git: Look, I know this commit I'm about to check out has a million files in it. But I only want to see and work on the src/ files, for instance. Don't bother extracting any of the screenshot/ or doc/ files at all. You don't need to make OS-level folders named screenshot and doc. You can re-use those files that I didn't see in the next commit I make, even though I won't and can't see them as I work. Now, go ahead and check out commit <insert hash ID or branch name>.

You must set up sparse checkouts carefully (and until recently, mostly manually; new Git versions have some stuff aimed more at people who just use Git, rather than people who write new code for Git itself, to make this easier). Anyone who doesn't set up sparse checkouts will see all the screenshot/ files.

Alternatively, if nobody ever needs to see the screenshot/ files at the same time as they see the source files, you can just make two independent streams of commits:

  • One stream of commits holds screenshots. You git checkout screenshots or git switch screenshots to get the latest one. Git removes all the currently checked out files (perhaps all the source files) from your working tree, and puts in place the committed screenshot files. Now you can update the screen shots and add and commit and have a new later screenshot set.
  • Another stream of commits holds source files. You git checkout main or git switch dev or whatever to get the latest such commit. Git removes all the currently checked out files (whether those are sources or screenshots) and puts in place the files from that latest commit.

In this setup, you use either the screenshots or the sources, but never both at the same time. This works great; it's just a different mind-set to get used to. The new (since Git 2.5) git worktree features—which are best not used too extensively, unless your Git version is at least 2.15—makes it possible to have one completely separate working tree on the screenshots branch and a different completely separate working tree on the main or dev or whatever branch. Just remember that it's not possible to have a combined working tree with files from both.3

To set up the separate-stream-of-commits branch, use:

git switch --orphan screenshots

for instance. (The old git checkout has a --orphan as well but if you use this you must also git rm -r ., which is a little scary.) This switch command empties out Git's index while also clearing out your working tree (though untracked files will remain in it). You can now create the screenshots/ directory, populate it, and run git add screenshots to add all the files that are in screenshots/*. The next git commit you run will create the branch with this first commit as its first and only commit. This branch is now separate from, and cannot easily be mingled with, the other branches, and when you git checkout main or git switch main, Git will remove all the committed screenshot/* files and re-populate the working tree with the source files.

Note that this separate-stream-of-commits trick is a lot like having a completely separate repository containing only the screenshots. All it gets you is that git clone gets you all the commits that go with "both repositories", as those commits are in a single repository instead of two repositories. If the screenshots are not actually part of the project, it may well be wiser to use a separate repository in the first place.


3It is possible to get both sets of files, but not with a plain git checkout or git switch. You need to tell Git to get you the latest commit from the sources, and the latest commit from the screenshots branch, and then you need to combine these two. There are some relatively straightforward ways to do this with, e.g., git archive, to put them into an area where nobody will do any work on / with them, but will have access to both at the same time in a single OS-level folder.

The reason to be careful with git worktree add before Git version 2.15 is that there's a nasty bug with added work-trees not being checked correctly during git gc. If you have an added work-tree sitting around for 2 weeks or more, you can start losing work done in it. It's not guaranteed to break, but tempting fate like this is unwise.

  •  Tags:  
  • git
  • Related