Home > Net >  Have new major version in new Git repo but keep old repo and have full history
Have new major version in new Git repo but keep old repo and have full history

Time:05-18

I am going to copy Visual Studio solution that is now in repo A to new solution B and create new repo on github for it.

The A will not be developed at all. But I need whole A history to be visible in B.

How can I do it?

Can I do it in VS without installing git?


Visual Studio 2022 Pro / Windows 11

CodePudding user response:

A Git repository consists, at its heart, of two databases:

  • One database—usually larger by far—contains Git's commit objects and other objects. Each object has a hash ID (OID or Object ID ), which is how Git will retrieve the object. Any given commit contains a full snapshot of all the files that go with that particular commit, frozen for all time in that commit. (Git does this by making sub-objects and literally sharing identical copies of files, via de-duplication, and also compresses files, sometimes tremendously successfully, so the fact that every commit holds a full snapshot of every file doesn't bloat up the repository—some repositories are actually significantly smaller than the files they hold).

    Each commit object holds a list of hash IDs of previous (parent ) commits as well. This list is typically exactly one entry long, though at least one commit (the very first one) has an empty list since there was no previous commit at that point, and merge commits have two (or potentially more) parents in this list.

    This list of parents, stored in each commit, forms the commits into a Directed Acyclic Graph or DAG. The DAG, as stored in this big database, is the history. If you have the commits, you have the history. If you don't have the commits, you don't have the history. So that part is simple: just get all the commits and you'll have all the history.

  • A separate database contains human-readable names: branch names, tag names, remote-tracking names, and other names. Git needs to provide this to be usable by humans, because the hash IDs (OIDs) for the database objects—which Git needs to retrieve those objects from the database—are random-looking and impossible for humans to deal with. By storing particularly important hash IDs under names, Git makes it easy for you to ask Git to get you "version v2.1" or "the latest commit for branch main". Git supplies the hash ID, by looking up the stored hash ID in that name. Each name stores just one hash ID, which is sufficient due to the DAG.

So:

I am going to copy Visual Studio solution that is now in repo A to new solution B and create new repo on github for it.

Note that a VS "solution" is not just a Git repository, but as long as the entire solution is contained within a (single) Git repository, cloning the repository suffices.

To clone the original (repo A) repository, use git clone. This:

  • Copies the entire commits database to a new database that uses the same hash IDs (all Git hash IDs are universal: all Git software everywhere agrees to use the same OID for the same object, and this is the real magic in Git).

  • Reads and transforms the names out of the names database, to build a new names database. The new database keeps tag names unchanged, but turns the original repository's branch names into remote-tracking names in the new clone.

  • Last, does a final git switch or git checkout step to extract one particular commit from one particular branch. As the new clone has no branch names yet (remember, Git does not need these kinds of branches: it only needs the objects; it's humans who need names for them), this would normally fail, but Git will create a branch from a remote-tracking name automatically.

The branch name that Git creates, in this last step, is the one you tell it to use when you run git clone. If you don't tell it some particular one, your Git software asks the other Git software (in this case on GitHub perhaps, wherever repo A is stored anyway) which name they recommend. Your Git then creates that as a branch name, using the hash ID stored in the remote-tracking name that your Git made from their branch name. (Whew!)

This is a lot clearer with an example. If they have branches named main and develop and you clone their repository, you get an origin/main and an origin/develop: remote-tracking names that remember their branch names. If they then recommend that your Git should create main, your Git uses your origin/main—which your Git made from their main—to create your branch name main, selecting the same commit as origin/main, which selects the same commit as their main. So you now have one branch.

The history in your clone is the set of commits in your clone, as found by all the names: tag names, remote-tracking names, and the one new branch name. You may, if you desire, create additional branch names to remember specific commits; you can create a branch name for each remote-tracking name, for instance. But typically you'll probably just want main or master anyway: any commits that are currently only find-able through origin/develop, for instance, are not relevant and need not be retained.

Now that you have a clone, you can push to a new empty repo B on GitHub

It's now time to create a repository on GitHub: repo B.

You can do this with the gh command-line interface (see ) from a shell like bash or zsh or PowerShell for instance, or you can do this with the web interfaces that GitHub provide. Either way you should make this as an empty repository. An empty repository has an empty pair of databases: no commits or other objects, and no names. But it still exists, even though it's empty.

Now that repo B exists on GitHub, you will use the command line interface (via bash or zsh or PowerShell or whatever, again) to tell your Git software to replace the origin URL. When you ran git clone, your Git software saved the URL for repo A, under the name origin. But you no longer want to talk to software that accesses repo A. You now want to talk to GitHub software that accesses repo B. So:

git remote set-url origin <url>

Replace the <url> part with the actual URL for repo B, whether that's https://github.com/... or ssh://[email protected]/... (either will work: the details involve GitHub setup things that you must do separately). Then use git remote -v to make sure you have set everything correctly.

Once the remote origin is your repo B, run git push --all origin to populate the two databases in the GitHub copy. This will fill in all reachable objects for all the branches you've created, and also set up all their branch names. Then run git push --tags origin to update their names database with all of your tags.

Note that commits that you have locally that are only findable via remote-tracking names (e.g., origin/develop) won't be sent to repo B here. If you want them to be sent, create a branch name:

git switch develop

for instance will notice that you don't have a develop, but you do have an origin/develop, and will use your origin/develop name and the saved commit hash ID to create a new develop using the same hash ID. You can then git push origin develop (or git push --all origin again) to create the name develop on GitHub and fill in those missing objects in their objects database.

If you create all the branch names you want before the git push --all origin step, you'll get all the objects and branch names transferred, since --all means send all branch names.

Optional: Saving space on GitHub and enabling pull requests

When you do all the above, you get an independent repo B on GitHub. That is, this new repo B has no connection to any other GitHub repository. You can't make pull requests from repo B on GitHub. If that's what you want, that's all fine.

If repo A exists on GitHub before you go to create repo B, though, GitHub offer you a way to make a full copy of repo A with one web-page button click. Using this button will copy both databases, i.e., you'll get all the objects (the full history) and all the branch and tag names. (GitHub do not use remote-tracking names, so there's nothing to copy here. Remote-tracking names are for your laptop Git repository.)

Using this button—a big green FORK button—helps them (GitHub) out by saving a bunch of disk space on their systems, and helps you out if you ever want to create pull requests. You said:

The A will not be developed at all.

which indicates that you never intend to create any pull requests to repo A. But if you ever change your mind about this, you might wish you had used the "fork the repo" button and saved space over on GitHub. That's entirely up to you, though.

A few last items

Can I do it in VS without installing git?

Probably not, but I don't use Visual Studio.

I am going to copy Visual Studio solution ...

As I understand it, a "solution" includes a .sln file. This file does go in the repository, so copying the commits will get you the original solution file. Since history—existing commits in any Git repository—is completely frozen for all time, this solution may have the wrong names in it. You cannot fix that except by replacing every historical commit with a new and different commit (that contains a different file). This is probably not worth doing, but again, I don't actually use VS.

  • Related