Home > database >  No file retrieved making a checkout from a copied .git directory, just a list of relative path of fi
No file retrieved making a checkout from a copied .git directory, just a list of relative path of fi

Time:09-01

I'am learning to use Git and would like to use it to save my work, not meant to be public. The size of the .git folder is 1.2 GB, too big to get a free private GitHub repository.

I planned to upload .git folder frequently on Google Drive.

If my laptop is broken or stolen, I'll be able to copy the .git directory from Google Drive, but what then?

I realize that I don't know how to retrieve all the files of the Git repository from the copied .git directory.

Let's say I create tmp directory, then copy .git inside tmp/, cd to tmp/ that contains .git directory and then run git checkout: No file is created in tmp/, I just get as output message of git checkout the list of the relative path of files contained in the Git repository preceded by D:

D file1.txt
D file2.txt
D dir1/file3.txt
...

CodePudding user response:

Do not keep a real Git repo on a Google Drive or another shared folder/volume. It will become corrupted and ruined.

You are looking for git bundle. The git bundle command does everything you need:

Bundles are used for the "offline" transfer of Git objects without an active "server" sitting on the other side of the network connection.

They can be used to create both incremental and full backups of a repository, and to relay the state of the references in one repository to another.

In your actual repo, it creates an archive that you can copy elsewhere for backup. And it lets you, in case of emergency or whatever, turn that bundle into a visible git repo folder.

CodePudding user response:

For a practical answer about what to do here, see matt's answer.

For "why not store a .git repo on [a] shared volume": the answer is that Git relies heavily on POSIX file semantics, when Git is doing its internal (.git directory) work. Private file systems, even on Windows and macOS systems, generally obey these semantics (sometimes within certain limits that Git has to live under, rather than using full POSIX semantics). Shared folders generally don't. The result is that things break—but usually not right away. Murphy's law means they break right before the big demo / test / other urgent deadline.

To answer your original question: running git checkout with no arguments is, effectively, very similar to running a limited git status operation. When given arguments, git checkout behaves like either git switch or git restore (new commands added to Git version 2.23 to split up the otherwise overly-complex git checkout).

This particular "git-status-like" operation simply compares the files in your current working tree against the files listed (and hence indirectly stored) in Git's index. Having copied the .git directory but not the working tree, you have made a duplicate repository—one similar to the one that git clone --mirror would make, but with some minor but important differences—but you have not duplicated the working tree. The new repository is now using whatever directory contents might be in the same directory that holds the .git directory as its working tree. That is:

Let's say I create tmp directory,

Since it's new, it's currently entirely empty (except perhaps for . and .. entries, and if you're on macOS and look at it with the Finder, Finder will create a .DS_Store).

then copy .git inside tmp/,

Assuming a proper recursive copy, you now have tmp/.git/HEAD, tmp/.git/refs/, tmp/.git/objects/, and so on: all the files inside the new .git that make Git recognize this as a repository. Unfortunately, you also have tmp/.git/index!

cd to tmp/ that contains .git directory ...

Assuming $GIT_DIR and $GIT_WORK_TREE (environment variables) are unset, Git will now "discover" the current Git directory (and corresponding top level working tree) in the usual way, which involves starting with the current working directory, which is now this tmp/. There's a .git/ here with the right stuff in it, so that's the Git repository, and this is the top level of the working tree.

The index in this Git repository (.git/index file, and perhaps some additional files that you've also copied, if Git is in split-index mode in the original repository) says that, in this working directory, there should exist files named file1.txt, file2.txt, dir1/file3.txt, and so on. It lists the hash IDs for the internal blob objects (which will be found in .git/objects/, either loose or packed) giving the contents for each such file.

Running git checkout with no arguments causes Git to compare the list in the index to the actual files present in the working tree. But those files aren't present! Obviously, you deleted them.

Now, you didn't actually delete them, you omitted them from the copying process. In other words, you (deliberately) failed to copy them. But the effect is the same as if you had copied them, then deleted them, so Git claims "deleted". Git's end-effect-result is correct, even if its method for getting there is different.1

So: the bottom line is that copying a repository is never exactly the same as cloning a repository, because cloning involves dropping some things (reflogs for instance) and perhaps doing some cleanup (packing or re-packing objects). This is particularly true when you use cp -r (or local OS equivalent) to copy just the repository part, without copying the working tree. Git has no control over, and often no observation of,2 the working tree while it's in use. It just takes strategic snapshots, when you run various commands. Running git checkout with no arguments is using one of those commands.


1This same rule applies to git diff output. Suppose you have a file that has line 1, line 2, line 2, line 3 as its four lines. You commit this, then you realize there's two line 2s. You delete one of them and commit. You display this commit, and Git claims you deleted the other of the two duplicate lines. That's wrong, but it's also right. It's not what you did, but it has the same effect. So it's right, even if it's wrong. It's all a question of which details actually matter.

2Git's file system monitor code, which is still under (perpetual?) development, is an attempt to change this for efficiency purposes. Given truly huge repositories (100 million files and 10 terabytes for instance), Git's existing scanning strategies, while clever, aren't enough to mitigate the cost of scanning. If Git could just know that you changed three specific files, that would be a lot cheaper. But keeping watch on a directory tree is hard, even on OSes that are attempting to make it less hard over time. Old versions of Git don't even try, and new ones need to accommodate FSMonitor failures.

  • Related