Ref: The following question from about 9 years ago:
Pull request without forking?
Background:
I am leaning about GitHub/Git, and I am running into issues. I have searched dillegently but have found nothing that addresses this specific issue - the closest thing I have found is the question noted above.
Issue:
I "forked" a repository intending to do some work, make a change to my own fork, and then create a pull request back to the original project as a way to contribute to it.
I finally figured it out and was able to successfully create a pull request containing my proposed change.
Note that there are other things I want to do to contribute to this project and after I created the pull request, I continued work and made additional commits to my local copy including importing some technical documentation, etc.
Apparently, for whatever unknown reason, after I make a pull request, the pull request "owns" my fork of the original repo and anything I do thereafter becomes a part of that pull request - it doesn't matter if it's related or not, did I push it to the project's branch, did I add it to the PR, or whatever. It just appears as if by magic, and can only be removed if I remove/revert the changes in my own repository fork.
Does this mean that all work on anything that has to do with that project has to come to a complete stop until that PR is accepted and/or rejected? If that's the case, how does anyone else, especially a company working on a single codebase, manage to get things done?
Of course, I am sure that this is possible, people do this all the time.
What research I have done has not disclosed anything that seems to address this specific issue, however other answers to different issues seem to hint at the fact that, once you fork a repo and create a pull request, the pull request DOES appear to "own" that instance of your local repo - and the only way to mitigate this is to:
- Fork the repo.
- Create an entire branch of the repo and do work.
- Commit to that branch and create a pull request, then abandon that branch.
To do additional work, regardless of where in the project, you have to:
- Create an entirely new branch.
- Do whatever work you wish to do that is supposed to be separate from the original work.
- Commit to the new branch, create the pull request, and then abandon that branch.
"Rinse and repeat" for any additional work you want to do, eventually having a fork with more branches than a Christmas Tree.
This gives rise to several questions:
- Is this true? Do I understand this correctly?
- Why? This seems to be unnecessarily complex and convoluted, especially with a single contributor.
The last and most important question:
3 . How do I clean up my local copy? Apparently I should have cloned the repo, then created a branch to work in, then created the pull request. (i.e. Is there a way to take my updated "main", turn it into a branch and then re-create the original main so I can create additional branches to do additional work?)
I hesitate to just "hack at" the existing repo trying to figure things out as I don't want to pollute the original pull request or screw things up on the upstream project.
Thanks!
CodePudding user response:
When you do a pull request, you propose to merge one of your branch into a branch of the original repository. Everytime you update your branch the merge is updated. This is quite useful when you do fix, or update after review.
Several solution for your case, the simple close your pull request, create one branch per topic you want to submit (each branch based on the trunk of the forked repository).
Second solution: create a branch to keep you extra work go back to the main branch (or master) force the already submitted branch to the original commit and push it
git checkout -b my_second_feature
git checkout main
git reset --hard <commit_sha>
git push -f
CodePudding user response:
Part 2—see part 1 if/as needed
git fetch
To run git fetch
, you pick a remote and invoke it as git fetch remote
. If you leave out the remote name, Git will pick a remote from somewhere, or try the default name origin
, depending on a lot of configuration items. If you only have the one single standard remote named origin
, running git fetch
with no additional arguments is fine: there's nothing else you could mean anyway.
What fetch does is:
- call up whatever Git software answers the stored URL;
- have them list out all their names (branches, tags, and others) and corresponding hash IDs; and
- obtain, from them, any commits they have that you don't.
Note that this is the same action we had for git clone
, except that instead of "get all their commits", it's now "get the commits they have that we don't". Since commits have globally unique IDs, we can easily tell that we have (say) commit a123456
because we have some object with ID a123456
, and that we lack—and therefore need—b789abc
because we have no such ID. Having obtained their new-to-us commits, our Git now updates our corresponding remote-tracking names.
In other words, git fetch
does pretty much the same thing as git clone
, except that our Git repository already exists, we may get a lot less data, and we don't have a final "create a branch and check it out" step. Since we can have more than one remote, we can run:
git fetch origin
and update all our origin/*
names, and then run:
git fetch upstream
and update all our upstream/*
names, if we've used git remote add
to add a second remote named upstream
.
To update all our remotes at once, we can use git fetch --all
or git remote update
; both do essentially the same thing. Note that --all
to git fetch
means all remotes, not all branches: we already get all branches. (I mention this because people keep thinking --all
means all branches and it never does.)
We can, if we want, limit our git fetch
like this:
git fetch origin main
This has our Git call up their Git as usual and list things out, but this time, our Git only bothers asking for any new-to-us commits they have on their main
. When everything is done, our Git then updates our origin/main
(we know where origin
's main
is now, so our corresponding remote-tracking name, i.e., origin/main
, can be updated). If they have new commits on their dev
, we don't get them, and we don't update our origin/dev
; our Git was told only bother with main
.
In some (rare) setups, this sort of thing can save a lot of data transfers. Git therefore offers something called a single-branch clone, in which git fetch
does this by default. This is where people try to use --all
(and it doesn't work): to fetch other branches from a single-branch clone, you must either add them—see the git remote
documentation—or use an explicit refspec. We won't cover refspecs properly here, for space reasons, though.
Since you will have two remotes, one for your GitHub fork and one for the GitHub repository that you forked, you'll want to run git fetch
twice, or use git remote update
or git fetch --all
now and then. Other than that—and having upstream/*
, if you called the second remote upstream
as most do—your repository is still just like any other repository.
git push
The git push
command is very much like git fetch
, with several key differences:
First, of course,
git push
means send stuff. You usegit fetch
to get new commits (and other internal objects) from some other Git (some other software working with some other repository).. You usegit push
to send new commits, often ones you made—but they can be ones you just got fromupstream
for instance—to some other Git.Second, once you've sent these commits, you are typically going to ask the other Git to set one of its branch names. There is no such thing, on the push side, like a remote-tracking name.
That last part means that you have to have permission to write to the repository. Git itself has no real access controls at all, but most web hosting sites, including GitHub, add theirs on. GitHub in particular add a lot of fancy controls here. Whether you and/or anyone else make use of them is up to you and them.
To do a git push
, you typically run a simple:
git push <remote> <name>
This says that you'd like your Git to look at commits on your branch named name
, find which ones are new to the other Git at origin
, send them to that Git, and then ask them, politely, if they would, pretty please, set their name name
to point to the same commit that your name
points to.
In other words, you are asking them to create or update their branch with the same name as your branch. In general, they will accept this if and only if this simply adds on to their branch (and you have permissions of course). That is, when we had:
...--G--H <-- main (HEAD), origin/main
because our main
matched origin's main
, and we added a new commit or two:
I--J <-- main (HEAD)
/
...--G--H <-- origin/main
and we run git push origin main
, our Git calls up their Git, sends them commits I-J
, and asks them to set their main
to point to J
.
If their main
still points to H
—or somehow, points back to G
because someone made them drop H
—they'll happily accept our request to add on to their main
. Since our Git sees their acceptance, we end up with:
...--G--H--I--J <-- main, origin/main
knowing that origin
's main
now points to commit J
.
But suppose someone else came along and added some commit K
to their main
:
...--G--H--K <-- main [over on origin]
Our request will now ask them to ditch their commit K
, which would leave them with this:
...--G--H--I--J <-- main
\
K ???
They will say no, and the error message you will get is not a fast-forward (remember those from merges? this is the same idea).
You can, using --force
or --force-with-lease
, try to get them to take the change, losing their new commits, but usually that's the wrong thing to do. For your usage of GitHub, however, sometimes this is the right thing to do on your fork! We'll get back to this later.
(I'm going to post this now and edit it shortly.)
CodePudding user response:
Note: this is quite long, but you really need to know these things. I've run out of space (there's a 30k limit on characters) so I'll break this into two separate answers. Part 2 is here.
While "pull requests" are not part of Git (they're specific to GitHub 1), there are some things we can say about them even without referring specifically to GitHub. Then we can plug in GitHub-specific items later. So let's start with this:
Git is all about commits. While Git commits contain files, Git isn't really about the files, but rather about the commits. And, while we use branch names to find commits, Git isn't really about branch names either: it's really just about the commits.
This means you need to know all about commits: what one is and what each commit, and a string of commits in a row, can do for you.
So we'll start with a quick overview of a commit, and then look at a string of them in a row.
1Bitbucket also has "pull requests", but they're very slightly different, and GitLab has "merge requests", which are again same-but-different. All of these build on the same base support in Git proper.
Commits
Each Git commit is numbered. The numbers are not simple sequential counting numbers, though: we don't have commit #1 followed by #2 and #3 and so on. Instead, each commit gets a unique hash ID—unique across all repositories everywhere, even if they're not related to your repository at all2—that seems random, but isn't.3 A hash ID is big, ugly, and impossible for humans to work with: computers can handle them, but our feeble brains become confused.