Merging Git-Repos after migration from RTC-Jazz-CodePudding

We have been working with Jazz-RTC for around 15 years and are forced to migrate to git in a short time-frame.

Our workflow was such, that we created streams for each release containing components, that represented the different folders of the project, i.e. server, gui, doc, db, etc.

Over time we added new components to newer streams so that the code-base now looks something like this:

V1.0
 |_server
 |_gui
 |_db
V2.0
 |_server
 |_gui
 |_db
 |_doc
V2.5
 |_server
 |_gui
 |_db
 |_doc
V3.0
 |_server
 |_gui
 |_db
 |_doc
 |_reports
....

Our migration script is working in a way such that for each stream (V2.0, V3.0,...) it takes each component (server, gui, ...) and creates a separate Git repository from it.

The change-sets are applied as commits in each respective repository so that we have retained the history for every component. This also means, that we have no branches in each repository, just a linear commit-history on a single (master) branch.

It's obvious, that there is duplicated code in the Git repos. E.g. in V2.0 the server repo has mostly similar files from the V3.0 server repo, with only minor changes on some files.

What we'd like to do now, is to combine these different Git repositories into one, so that the structure looks something like this:

Combined_Project
  |_server
  |_gui
  |_db
  |_doc
  |_reports

Of course we need the history of file-changes (i.e. commits) to be in the right order (ordered by Date).

In order to achieve this task we would appreciate any Git-internal solution but we would also accept using third party tools.

I have researched this topic for days now, but the more info I find about it, the more confused I get.

Doing a simple git remote add -f V2.0gui <gui-from-other-repo> followed by git merge V2.0gui/master creates a merge-commit and merges the repositories but in the logs I see, that the commits are not in the right order (e.g we have commits from March 2022 that come before commits from January 2022).

I have tried to rebase the "remote" repositories into a common repo but this also messes up the commit history.

The question is, how would this task be tackled in the best way? What tools or strategies would you use?

Update: As the whole code has been worked on in a linear fashion, it would suffice to have one Git repository with no branches as a result. This means, that the commits of the different repositories should be all on the master branch of the resulting repo (depending on their date of check-in/commit).

CodePudding user response：

Commits connect directly to previous commits, by hash ID. This forms the Directed Acyclic Graph (DAG) of commits. Commits are also immutable, so to combine two separate Git repositories with two separate graphs:

A--B--C   <-- master   [in repo 1]

D--E--F--G--H   <-- master   [in repo 2]

into a single combined repository with a graph such as:

      C'  <-- v1-branch
     /
AD-BE
     \
      F'-G'-H'  <-- v2-branch

where AD is either A or D (because they're essentially identical) and BE is either B or E (for the same reason), you can literally copy A-and-B, or D-and-E (but not A-and-E for instance since E always points back to D) into a fresh, new, empty repository, but then when you go to copy C, you may have to replace it with a new C' that's like C (snapshot) but different from C (different parent hash ID). If you took A-B as is, you can take C as-is, but now you have to replace F with F' so that it points back to B instead of E, and then you have to replace G with G' so that it points back to F', and so on.

The two tools that Git comes with—well, one tool that it comes with, one that you can get for it—that do this sort of thing are git filter-branch and git filter-repo. Filter-branch is hard to use correctly. Filter-repo generally requires a little more code-writing as it's a Python script that will evaluate your own Python code.

In this particular case, you might want to just take the existing filter-repo code and rework it to read multiple input repositories and figure out on its own which commits to join with which previous commits. This won't be easy, no matter how you go about doing it.

CodePudding user response：

I had something more like this (as a result) in my mind: One Repository, no branches, just master

For doing a lot of RTC to Git migration these days, I can attest to never follow that approach.

Instead:

one repository per UCM component
generally only one stream is imported, as main in the new git repository
Only a few baselines from the RTC stream are imported, as shown here.

cd /path/to/git/repo
git add --work-tree=/path/to/local/RTC/sandbox/aComponent add .
git commit -m "release x"
# change baseline in local workspace