Branch A has less code than branch B. I want to merge branch A into B so that B will end up with less code and essentially have the same exact code as A. Similar to undoing multiple commits. The problem is that I have to do this through a Pull Request merge. I cannot push directly to B, it has to be through A (the feature branch).

How should the Pull Request look like? When I try to merge A into B it doesn't detect any differences - why is that? If I flip the Pull Request around (B into A) it shows all the changes that B has but A doesn't have.

CodePudding user response：

TL;DR

You want a new commit whose snapshot is from an old commit. You can then make a PR from this. Making this new commit with normal Git tools is tricky, but making it with a bypass is easy. I'll leave that for the long section though.

Long

We need to distinguish here between a pull request—a thing GitHub add,¹ over and above what Git does—and what Git does on its own. Once we do that, things get a little clearer, although since this is Git, they may still be rather unclear.

Git is really all about commits. Git isn't about files, though commits contain files. Git isn't about branches either, though we (and Git) use branch names to find the commits. So Git is all about commits. This means we need to know exactly what a commit is and does for us:

Each commit is numbered. The numbers are, however, big and ugly and random-looking, expressed in hexadecimal, as, e.g., ^{_{e9e5ba39a78c8f5057262d49e261b42a8660d5b9}}. We call these hash IDs (or sometimes more formally, object IDs or OIDs). There's no telling what hash ID some future commit will have. However, once a commit is made, that hash ID refers to that commit, and no other commit, anywhere, ever.² This allows two different Git repositories to see whether they have the same commits, by just comparing commit numbers. (We aren't going to use that property here, but it's important.)
Each commit stores two things:
- A commit has a full snapshot of every file (though these are compressed—sometimes very compressed—and , via the same sort of cryptographic tricks used to make the commit numbers, de-duplicated).
- A commit also has some metadata: information about the commit itself, such as who made it, and when. In this commit data, each commit stores a list of previous commit hash IDs, usually exactly one element long. The single previous-commit hash ID is the parent of this commit.

This my-parent-is-Frank, Frank's-is-Barb stuff glues the commits together into their ancestry chains. When we use a normal git merge, Git uses the ancestry chain to figure out what to merge. We don't want a normal merge here though. Meanwhile this same parent stuff is how Git turns a commit—a snapshot—into a "change": to figure out what changed in "me", if my parent is commit feedcab (can't be frank, too many non-hexadecimal letters in that one) and I'm commit ee1f00d, Git compares the snapshots in these two commits. Whatever is the same, didn't change. Files that are different did change, and Git figures out—by playing a sort of Spot the Difference game—what changed in them and produces a recipe: do this to the feedcab version of this file, and you'll get the ee1f00d version.

Now, nobody actually uses the raw commit numbers to find commits. What's the commit number of your latest commit? Do you know? Do you care? Probably not: you just use main or master or develop or some name to find it.

Here's how that works. Suppose we have a tiny repository, with just three commits in it. Let's call them A, B, and C (instead of using their real hash IDs, which are big and ugly and we don't know them anyway). These three commits look like this:

A <-B <-C   <--main

Commit C is our latest. It has a snapshot (a full copy of all of the files) and metadata. Its metadata lists the raw hash ID of earlier commit B: we say that C points to B. Commit B, meanwhile, has a snapshot and some metadata, and B's metadata points to A. A has a snapshot and metadata, and since A was the first commit, its metadata simply doesn't list a parent. It's an orphan, sort of (and all the commits were virgin births, sort of—well, let's not go down this road any further). So this is where the action stops, and that's how we know there are just the three commits.

But we find commit C by name: the name main points to C (holds the raw hash ID of C), just like C points to B.

To make a new commit, we check out main, so that C is our current commit. We change stuff, add new files, remove old files, whatever, and use git add and then git commit to make a new snapshot. The new snapshot gets a new random-looking hash ID, but we'll just call it D. D points back to C:

A <-B <-C   <--main
         \
          D

and now git commit pulls off its clever trick: it writes D's hash ID into the name main:

A--B--C--D   <-- main

Now main points to D instead of C, and there are now four commits.

Because people use names, not numbers, to find commits, we can go back to some old commit by throwing out our access to the newer commits. We force a name, like main, to point to some older commit, like C or B, and forget that D exists. That's what git reset is about. That's presumably not what you want here though, especially because Git and GitHub like to add new commits, not take them away. A pull request in particular won't let you take a commit away.

No, what you want instead is to make a new commit whose snapshot matches some old commit.

¹If you're not using GitHub, perhaps you are using some other site that also adds Pull Requests. This gets a bit tricky since each site that adds them, does it their own way. GitLab, for instance, have something similar but call them Merge Requests (rather a better name, I think).

²This depends on some cryptographic tricks that will eventually fail. The size—the big-and-ugly-ness of the hash ID—pushes the failure off as long as we need, although now it's a bit too small and they're going to get even bigger and uglier soon.

Normal merges

In normal everyday Git usage, we make branch names, and we use those branch names to add commits. I already showed a really simple example. Let's get a little more complicated. We'll start with a small repository, as before:

...--G--H   <-- br1 (HEAD)

I've added the HEAD notation here to indicate that this is the name of the branch we have checked out. Let's now add another branch name, br2, that also selects commit H right now:

...--G--H   <-- br1 (HEAD), br2

Since we're using commit H via the name br1, any new commits we make now update only the name br1. Let's make two new commits:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- br2

Now let's check out commit H again, with git switch br2:

          I--J   <-- br1
         /
...--G--H   <-- br2 (HEAD)

and make two more commits:

          I--J   <-- br1
         /
...--G--H
         \
          K--L   <-- br2 (HEAD)

We can now run git checkout br1 and then git merge br2, or just run git merge br1 now. Let's do the former: the snapshot we get in the end is the same either way, but other things change a bit, so we have to pick one.

Either way, Git now has to perform a real merge (not a fast-forward fake merge, but a real one). To perform a merge, Git needs to figure out what we changed on br1, and what they (ok, we, but not for the moment) changed on br2. That means Git has to figure out where we both started—and if we just look at the drawing, it's pretty clear: we both started from commit H. We made "our" changes and committed (several times) and got the snapshot that is in J.

The difference from H to J:

git diff --find-renames <hash-of-H> <hash-of-J>

tells Git what we changed on br1.

A similar difference:

git diff --find-renames <hash-of-H> <hash-of-L>

tells Git what they changed on br2. (Note that Git is using the commits here: the branch names, br1 and br2, just served to find the commits. Git then used the history—as recorded in the parents in each commit—to find the best shared starting-point commit H.)

To perform the merge itself, Git now combines the two diff listings. Where we changed some file and they didn't, Git uses our changes. Where they changed a file and we didn't, Git uses their changes. Where we both changed the same file, Git has to combine those changes.

If we both made the exact same change, that's fine. If we touched different lines, that's fine too—although there's an edge case here: if our changes abut, Git declares a merge conflict; but if they overlap exactly, with the same changes, that's OK). If all goes well, so that there are no merge conflicts while combining changes, Git can apply the combined changes to the snapshot from H. This keeps our changes and adds theirs—or, equivalently, keeps their changes and adds ours. Where our changes overlap exactly, Git keeps just one copy of the changes.

The resulting snapshot—H plus both sets of changes—goes into our new merge commit. There's one thing that is special about this new merge commit though. Instead of just the one normal parent, which in this case—on branch br1—would be J, it gets two parents:

          I--J
         /    \
...--G--H      M   <-- br1 (HEAD)
         \    /
          K--L   <-- br2

As always, Git updates the current branch name to point to the new merge commit M. The merge is now complete.

`git merge -s ours`

Let's draw what you want. You are starting with this:

          o--o--...--R   <-- br-A
         /
...--o--*
         \
          o--o--...--L   <-- br-B (HEAD)

You would like to git merge br-A, but keep the snapshot from the commit L at the tip of br-B.

To accomplish what you want in raw Git, you would run:

git switch br-B
git merge -s ours br-A

Git would now find the merge base * (or not bother, really), then ... completely ignore their changes, and make a new merge commit M, on the current branch:

          o--o--...--R   <-- br-A
         /            \
...--o--*              \
         \              \
          o--o--...--L---M   <-- br-B (HEAD)

where merge commit M has L and R as its two parents, but uses commit L as the snapshot.

That's easy, in raw Git. But GitHub won't do this! How do we get GitHub to deliver this kind of result?

We have to trick GitHub a bit

Suppose, for argument sake, that we were to git switch br-A—i.e., check out commit R—and then make a new commit whose snapshot is that from commit L? That is, we make:

          o--...--R--L'  <-- br-A (HEAD)
         /
...--o--*
         \
          o--o--...--L   <-- br-B

Commit L' has a different hash ID from commit L, and has different metadata—we made it just now, with our name and email and date and time and so on, and its parent is R—but has the same snapshot as commit L.

If we had Git do a normal merge here, Git would:

git diff --find-renames <hash-of-*> <hash-of-L>
git diff --find-renames <hash-of-*> <hash-of-L'>

to get the two diffs that Git needs to combine. These diffs would show exactly the same changes.

A normal merge will combine these changes by taking one copy of all of the changes. So that's just what we want! The final merge result will be:

          o--...--R--L'  <-- br-A
         /            \
...--o--*              M   <-- br-B (HEAD)
         \            /
          o--o--...--L

where I've drawn this in the other style (with M in the middle) for no particular reason. The snapshot in M will match both commits L and L', and branch br-B will end at the new commit, with no changes to any files, but with a new commit on the end.

We can easily make commit L' in Git, and then raise a Pull Request on GitHub by sending commits up through L' on our br-A branch. The PR will merge smoothly, by "changing" nothing at all in br-B, just adding the new merge commit M. So—except for the extra L' commit—we get the same effect as with git merge -s ours run on branch br-B.

Doing this the hard way

The hard way to get snapshot L' added to branch br-A is this:

git switch br-A
git rm -r .                         # from the top level
git restore -SW --source br-B -- .
git commit -C br-B

for instance. The first step puts us on br-A with commit R checked out. The second one—git rm -r .—removes all files from Git's index / staging-area, and the corresponding files from our working tree. The git restore puts all files back but takes them from --source br-B or commit L, and last step, git commit -C br-B, makes a new commit using the message from commit L. (With -C you can edit this.)

This works fine, it's just a bit slow. To go faster, we can use either of two tricks. Here's the first one, which is probably the one I would actually use:

git switch br-A
git read-tree -u --reset br-B
git commit -C br-B

This eliminates the remove-and-restore in favor of git read-tree, which can do them in one swoop. (You can use -m instead of --reset but one of the two flags is required, and git read-tree is a tricky command that I don't like to use much, so I never remember offhand which one to use: fortunately, here it doesn't matter.)

Or, we can do this:

git switch br-B      # so that we are not on br-A
git branch -f br-A $(git log --no-walk --format=%B br-B | git commit-tree -F - -p br-A br-B^{tree})

if I haven't made any typos. This gives you no chance to edit the commit message, though. You need not check out br-B directly, you just need to make sure that either you're not on br-A, or you use git merge --ff-only to move forward after making the commit.

It would be nice if GitHub could do a `git merge -s ours`

But it can't, so that's that.

CodePudding user response：

Test rebase A feature branch (including cleaned code) B your dev

1st save your dev

git checkout B git add git commit -am "blabla my dev"

then update A

git checkout A git pull A

then rebase B on top of A

git checkout B git rebase A

At this point you might have to manage some conflict