Home > OS >  How to remove certain files from a pull request with allot of commits?
How to remove certain files from a pull request with allot of commits?

Time:11-19

To generalize, suppose I have a project with the following directory. How would I ultimately remove file2.txt after pushing and doing a pull request?

app/someFolder
  - file1.txt
  - file2.txt
  - file3.txt

Suppose my commits are these

Commit 1       
  file1.txt
    Hello World
  file2.txt
    Cool, Superb
  file3.txt
    December 2

git add .
git commit -m "commit 1"
git push --set upstream origin someBranchOnRemote

 Commit 2  
   file1.txt
     Hello World
     Boss Bass

git add .
git commit -m "commit 2"
git push

 Commit 3
   file3.txt
     December 3

git add .
git commit -m "commit 3"
git push

So if I were to do a pull request the files would look like this

file1.txt
  Hello World
  Boss Bass
file2.txt
  Cool, Superb
file3.txt
  December 3

Now how would I update the pull request so I can have file2.txt not be included? Suppose the hashes are hash1, hash2 and hash3. The final output I want in the pull request would be

file1.txt
  Hello World
  Boss Bass
file3.txt
  December 3

CodePudding user response:

TL;DR: you want git rebase -i followed by git push --force or git push --force-with-lease. But read the following.

First, a side note: Git itself does not have "pull requests"; those are features of certain hosting sites such as GitHub and Bitbucket. They tend to work similarly on each hosting site, but each site has its own quirks and behaviors here. You may have to adapt this answer for whichever hosting site you're using.

With that out of the way, a PR is a request you make to someone that they merge (or "fetch and merge" = "pull") some commit(s) you've made. In Git, you don't really merge a branch: you actually merge commits. The commits that you will merge, when you run git merge, are those from some chain of commits, as ended by the last commit in that chain.

That is: commits form chains. Each commit in a chain remembers the raw hash ID of its predecessor commit. We say that a commit points to its parent commit, and we can draw that like this:

... <-E <-F <-G <-H

A branch name then simply provides the raw hash ID of the last commit in the chain, from which Git will find all the previous commits:

...--E--F--G--H   <-- branch

When you go to make a pull request, you:

  • begin by forking and/or cloning some repository, so that you get all the commits that someone else has;
  • create a new branch name, so that you have a name pointing to the last commit that's also one of their commits;
  • make new commits so that your branch name advances.

For instance, let's say that their commits go up through (and then stop at) the commit I was drawing above as E. (By the way, I only stopped drawing arrows between commits out of laziness: commits always point backwards, so any time you see a connecting "line", it's really a backwards-pointing "arrow".)

That is, they have, in their repository, some sequence of commits:

...--D--E   <-- somebranch

You now have, in your repository:

...--D--E    <-- origin/somebranch

You create a new branch name pointing to commit E:

...--D--E    <-- my-fancy-new-feature, origin/somebranch

Now you make new commits while "on" this new branch:

...--D--E    <-- origin/somebranch
         \
          F   <-- my-fancy-new-feature (HEAD)

This is your "hash 1", or "commit 1", that affects three files. Commit F has all the files in it, as all commits always have a full snapshot of every file, but the files in commit F are all the same as all the files in commit E, except for the three that you changed. (Git cleverly de-duplicates identical files, so that this doesn't take very much space, either.)

Now that commit F exists, you make another new commit G:

...--D--E    <-- origin/somebranch
         \
          F--G   <-- my-fancy-new-feature (HEAD)

This is your "commit 2", which changes only file file1.txt. Commit G still has every file, it's just that its copy of file2.txt matches that of commit F; its copy of file3.txt matches that of commit F; and all its other files match those of commits F and E.

Finally, you add commit H:

...--D--E    <-- origin/somebranch
         \
          F--G--H   <-- my-fancy-new-feature (HEAD)

In commit H you've replaced file3.txt with a modified file; file1.txt and file2.txt matches the copies in commit G, and so on.

That brings us to your question again:

... how would I update the pull request so I can have file2.txt not be included?

Git works on the basis of commits, not files, and your PR says please merge commit H. To change this, you must either:

  • somehow change commit H, or
  • change the PR so that it lists some other commit hash ID, not H.

It's literally impossible to change anything about any commit, ever, so the first idea is right out.

Whether it's possible to change the PR so that it lists some other commit, depends on the hosting site. If the hosting site is particularly obnoxious, you might have to close this PR, and open a new one later. But GitHub at least will let you update the PR quite simply.

Your first task, though, is to come up with new commits. You don't want file2.txt changed, but it was different in commit F (vs commit E), so commit F itself is bad in some way. This means you need a new replacement for commit F. Let's call this F' to indicate that it's a lot like F, but it will have a different raw hash ID.

To get commit F', we want to "copy" commit F without quite committing yet. We'll start by checking out commit E. We could create another new branch name, but we could also use Git's "detached HEAD" mode, like this:

...--E   <-- HEAD, origin/somebranch
      \
       F--G--H   <-- my-fancy-new-feature

Now we'll run, say, git cherry-pick -n and give Git commit F's hash ID, or something equivalent: my-fancy-new-feature~2 for instance. Git will copy the effect of F but not commit anything yet—we'll have some work in progress that we can commit—and now we have a chance to undo the change to file2, with, e.g., git restore:

git restore -SW --source=origin/somebranch file2.txt

A quick git status and git diff --cached will show that we've now retained the updated versions of file1.txt and file3.txt, but gone back to the original file2.txt from commit E as found by the name origin/somebranch.

We can now run git commit to make F':

       F'  <-- HEAD
      /
...--E   <-- origin/somebranch
      \
       F--G--H   <-- my-fancy-new-feature

Commit G affects file1.txt only, so we can just copy it wholesale, with git cherry-pick, which will not only figure out what it changed and apply it, but also make a new commit, re-using the original commit's message:

       F'-G'  <-- HEAD
      /
...--E   <-- origin/somebranch
      \
       F--G--H   <-- my-fancy-new-feature

You might wonder why we copy G to G', rather than just using G itself. The answer is simple: nothing about commit G can ever change. The arrow coming out of G, pointing to F, is part of G. It can't change! Commit G will forever point back to commit F, never to commit F'. So we have to copy G.

Also, commit G has the wrong copy of file2.txt in it, of course, which would also force us to copy it—but anything that forces us to copy the commit, forces the whole thing. Note that when we do "copy" G with cherry-pick, Git compares the snapshot in G to that in F to see what changed. Since file2.txt in this pair-of-commits did not change, Git won't change file2.txt in G' vs F'. So G' will have the same file2.txt as F', and F' has the same file2.txt as E.

Now, for the same reasons, we need to copy H, which we can do with one more git cherry-pick command. The result is:

       F'-G'-H'  <-- HEAD
      /
...--E   <-- origin/somebranch
      \
       F--G--H   <-- my-fancy-new-feature

Now that we have the right commits, all (all?!) we have to do is to get the name my-fancy-new-feature to point to H' instead of H. We can do that in various ways, such as git checkout -B my-fancy-new-feature or git switch -C my-fancy-new-feature. The final result here will be:

       F'-G'-H'  <-- my-fancy-new-feature (HEAD)
      /
...--E   <-- origin/somebranch
      \
       F--G--H   ???

What happens to the F-G-H chain, that Git used to find by looking at the name my-fancy-new-feature? The answer is: nothing happens to it. It's still there. It's just that now, it goes unused. These aren't the droids commits you're looking for, so we just make sure that these aren't the commits we find.

We now have the right commits, locally, in this repository. Now we have to get them to the hosting site, and get the hosting site to update the pull request. To do that on GitHub, we just push the new commits to GitHub, telling the Git over on GitHub to replace the F-G-H commits in its repository with our new F'-G'-H' chain.

Git in general is greedy for commits, so if we just run a regular git push origin my-fancy-new-feature, they—the Git over on GitHub, operating on your repository over there—will reject our attempt to do this. They will say, in effect, No! If I do that I'll lose the F-G-H chain! (As with our own repository, the commits won't be gone, they just won't be findable by the name my-fancy-new-feature any more. But that's enough for them to reject the request.) You'll likely get a suggestion that you pull (i.e., fetch and merge) the commits from GitHub: they don't realize that they got them from you in the first place, and that you're telling them these are the new and improved replacements so you should ditch the old ones in favor of these new-and-improved ones.

To make them realize that, you need some kind of forced-push (not Star Wars style "force", but just the regular English-language meaning). Git has several kinds and you can use any of them here, but --force-with-lease has a safety feature (that shouldn't matter here: if it does, something has gone not-according-to-plan, and the safety feature detects that) and is generally the way to go.

Making this easy(ish)

The sequence above has lots of Git commands in it, many of them tricky (I didn't show the full commands for multiple reasons). We can reduce that to a smaller number of much-less-tricky commands using git rebase -i. There's still one big bit of trickiness though.

Running:

git switch my-fancy-new-feature
git rebase -i origin/somebranch

is how we start. The rebase operates on the current branch, so we begin by checking out my-fancy-new-feature (you can use git checkout or git switch here, or do nothing if you're already on it).

What rebase does is:

  • list out commits to copy (hash IDs);
  • use Git's detached HEAD mode to begin copying; and
  • start cherry-picking.

Once it's all done, it fixes up the detached HEAD by moving the branch name to the last of the copied commits (H' in our case). So that automates a lot of the hard work.

Rebase in general is what we use when we have some commits that we mostly like, but there is something about those commits that we don't like. Since nothing about any existing commit can change, rebase works by copying the commits. The new copies can be changed along the way, before we commit them.

The interactive rebase in particular gives us more opportunities for change. A plain rebase just copies everything without giving us a chance to fix stuff, which is useful for moving commits—for taking a chain like this:

          A--B--C   <-- topic
         /
...--o--o--o--o   <-- mainline

and copying it to:

          A--B--C   ???
         /
...--o--o--o--o   <-- mainline
               \
                A'-B'-C'  <-- topic

so that the commits now come at the end of the mainline, instead of sprouting from an earlier point. That's not what we want here: we want to change some of the files in one of the commits.

So, interactive rebase, instead of just planning out all the cherry-picks and then starting them, writes out an instruction sheet. This instruction sheet lists the cherry-picks, using the word pick for each one:

pick hash1  subject
pick hash2  subject
pick hash3  subject

Then, once the instruction sheet is written, git rebase -i opens an editor on the instruction sheet so that we can change the commands.

In our case, we don't want to just pick commit #1 as-is. We want a chance to change it. So we will change pick to edit. We do want to pick #2 and #3 as is, so we'll leave those alone. Then we write out the instruction sheet and exit the editor,1 to return to the cherry-pick action.

Having changed the first pick to edit, Git will cherry-pick the first commit but then stop to let us fix it up. There's one thing that's particularly tricky here: Git has actually made a temporary commit at this point, so when we do fix it up, we have to run git commit --amend.2 We can now do our git restore as I described earlier, then run git commit --amend:

git restore -SW --source origin/somebranch file2.txt
git commit --amend

(note: --source=origin/somebranch and --source origin/somebranch work the same way here, so you can use either one).

Once we're done fixing up that edit-able commit, we tell Git to resume the rebase:

git rebase --continue

This will finish off all the remaining cherry-picks, then re-arrange the branch name and re-attach our HEAD to our branch, and now we have what we wanted:

       F'-G'-H'  <-- my-fancy-new-feature (HEAD)
      /
...--E   <-- origin/somebranch
      \
       F--G--H   ???

We're now ready to run:

git push --force-with-lease origin my-fancy-new-feature

and if we're talking GitHub, the "update the PR without closing and re-opening" is now done.

We used a total of five or six Git commands, about half of what we needed earlier, and we didn't have to do anything tricky except for the interactive rebase "edit" step. Everything else is pretty straightforward here.


1Some editors don't exit: you start them up early and then they hang around forever. Examples include many cases of Emacs, Sublime, and Atom. If you're using one of these editors, you have to make them interact well with Git; that's a matter for that particular editor, but most of these editors these days have a --wait flag that arranges all of that to work right.

2The --amend flag seems to change a commit, contradicting the claim above that we can't change any commit. The dirty secret here is that --amend doesn't change a commit. Instead, it just makes yet another commit. So when we use --edit we generate extra, rather pointless, "trash" commits. But commits in Git are so small and cheap that it's better to do this than to avoid it. Git will eventually clean up after itself, though this generally takes more than a month. The cleanup / janitorial stuff that Git does is a bit slow, but sweeping up a month's worth of trash in 5 minutes, rather than cleaning up every bit of trash right away, is actually a highly practical tradeoff.

CodePudding user response:

According to your requirements, I would just delete the file, make a new commit out of it and push it to your branch. That way there would be a 4th commit in the PR removing the file and the final result would be as you say.

  • Related