How to merge a git repo into a parent repo maintaining commit history?-CodePudding

I have always kept track of files in the child repo. But now I also need to keep track of another child, so I renamed them to child1 and child2. I would like to maintain only a parent directory that contains both folders, but I don't want to lose commits from the child repo.

Before:

/
/docs/
/docs/child/
/docs/child/.git/           # repo at child level
/docs/child/file-a
/docs/child/file-b

After:

/
/docs/
/docs/parent/
/docs/parent/.git/          # parent repo contains all files and child1 commits
/docs/parent/child1/
/docs/parent/child1/file-a
/docs/parent/child1/file-b
/docs/parent/child2/
/docs/parent/child2/file-c

How can achieve this simple setup?

CodePudding user response：

Remember that a Git repository contains commits, not files. (The commits then contain files, but we work commit-by-commit, not file-by-file.)

Your existing repository, which you had stored in docs/child (as the .git directory therein), contains a series of commits. Each commit has a full snapshot of every file.¹ The files in these commits are file-a and file-b, for instance.

You would now like this same repository to add new commits in which files in the commits are named child1/file-a, child1/file-b, and child2/file-c for instance. That's easy to do: enter the repository working tree, create child1 and child2 sub-directories, and—for convenience²—use git mv to rename file-a to the name child1/file-a and rename file-b to child1/file-b. Create new file child2/file-c, use git add on it, and run git commit, and you add one new commit to the existing repository; in this one new commit, the contents are stored under these new names, while in all the existing commits, the snapshots store the files under their old names.

Note that Git does not store directories: it just stores files whose names may or may not contain embedded (forward) slashes. Git will, on demand as needed, create a directory that your OS requires, because your OS insists that there's no such thing as a file named child1/file-a: that this is a directory named child1 holding a file named file-a. Git insists that, no, this is a file named child1/file-a; Git fully manages³ this mismatch between Git's idea of files and your OS's.

Remember: a Git repository holds commits. Git isn't about files or branches, but rather about commits. The commits hold files (which we need to get our work done), and the branch names help us (and Git) find the commits we want that contain the files we need. But at the repository level, Git is about commits. When you think about things stored in a Git repository, think about the commits. Each one holds a snapshot of all files, plus some metadata.

¹The files inside commits are compressed and Git-ified so that only Git can read them, and literally nothing at all can write them. They are de-duplicated as well (within and across commits) so the fact that every commit saves every file every time doesn't cause the repository to bloat to ridiculous sizes (though some binary files defeat this trick, and then the repository does bloat to ridiculous sizes, which is why it's unwise to store large binary files in Git).

²As far as Git is concerned, there's no difference between removing some file named file-a and creating a totally new child1/file-a file that holds the same contents, vs renaming an existing file-a to child1/file-a. The final commits simply hold the contents; if the contents of child1/file-a in the new commit match, byte-for-byte, the contents of file-a in the old commit, they're de-duplicated. Still, it's a heck of a lot more convenient to use git mv here, unless you've already used plain mv or whatever renaming or restructuring commands your OS uses. If so, feel free to use git rm and/or git add to update Git's idea of the files' names and contents. Git won't care how you got from the old setup to the new setup: all it contains are commits, and those commits just hold snapshots of files (plus metadata, but the metadata do not contain renaming information).

³For some value of "fully" anyway: the directory/file conflict code in Git has a history of the occasional small bug-ette. It's pretty good, all in all, considering how complicated this mismatch is, between Git's way of thinking and the OS's. Still, in some cases of complicated detected-after-the-fact renames across commits that result in files moving to new directories, some versions of Git are better than others.

CodePudding user response：

Based on torek answer, since I did not need to merge two child repos into a parent one, I ended up re-using the initial child repo as parent. What I did was:

# enter the child repo
cd child
# create the new child folder
mkdir child1
# move all files inside "child" from their original location to their multi-child destination
# the -k flag avoids the error caused by copying child1 into child1
# this operation doesn't copy files or folders starting with a dot
git mv -k * ./child1
# running `git add -A && git status` would show that all files have been renamed (moved)
# create child2 and move files there
mkdir child2

Now the structure looks like this:

/
/docs/
/docs/child/
/docs/child/.git/
/docs/child/child1/
/docs/child/child1/file-a
/docs/child/child1/file-b
/docs/child/child2/
/docs/child/child2/file-c

Now one can rename the "child" folder into "parent" and run git config -e inside it to point to another remote repo, if needed.