Disclaimer: I am a beginner to Git so forgive me if parts of the question do not make sense.
A company I applied for sent me a programming assessment.
The assessment page has a link which created a private repository on my GitHub (a repo that I own, not a forked repo). The created repo has an open issue
about the new features they want me to implement, which I have implemented.
They then say that I need to make a "pull request (without merging)" with my changes before I press the submit button on the assessment page. However, I thought pull requests were only for forked repositories, so how do I do a pull request on my own repository? I tried to fork the repository but couldn't. I also tried to open a pull request against an old version of the branch, but I need to commit my solution for this to work, which I am afraid to do because I could be wrong.
Any help is appreciated!
CodePudding user response:
"Pull requests" are a GitHub feature, not a Git feature. Git does not have Pull Requests—not the GitHub style ones, anyway. (Bitbucket, like GitHub, has pull requests, which they call "pull requests", and GitLab, like Bitbucket and GitHub, has pull requests, but they call them "merge requests". So PRs are very common on Git hosting sites, for reasons that become obvious once you start using the hosting sites.)
The tricky bit here is that a pull request ("PR") made from yourself to yourself has no obvious reason for existence. The reason any PR (of any of the three flavors mentioned above) exists is because these three systems support what they call forks. Forking a repository is much like cloning a repository, with two small differences that add up to be quite large. You make a PR exists because you made a fork. The PR allows you to communicate with the owner of the other repository, with a lot of convenience and some nice features. But if you didn't make a fork in the first place, there is no "other repository": the only person around to communicate with is yourself. That's kind of silly.1 Can't you just ask yourself your own questions and tell yourself your own stories? Still, the fact that it's easy to make a PR to "the other guy"—the owner of some other repository that you've forked—makes it trivial to make a PR to yourself.
To see what this is all about, let's start with an overview of Git itself.
1This isn't necessarily true within a larger organization, such as a company. GitHub, at least, currently have some limitations here that act as barriers to use of forks within some organization—so sometimes we will use an organization's repository and make pull requests "to ourself", where "ourself" means "someone else who works for FooCorp and has the same access to the same repository", for instance.
Git is a Distributed Version Control System
Git is used as a Version Control System or VCS. Such systems have a pretty long history and a lot of practical knowledge and usefulness behind them. Until the Internet became widely available, though, most version control was centralized: there was some master server (or server-like entity) that maintained the versioning, and you would just extract some files or some version, work on it, and then send updates back to the master control system or site. These centralized VCSes (CVCSes) offer a bunch of features, but have a bunch of drawbacks. So people invented distributed version control, where no single person or entity has the "source of truth" version. Instead, everyone has, or can get, every version. A DVCS (distributed VCS) can be used like a CVCS: one simply points to one of the copies and says "that's the central one". But a CVCS cannot be used as a DVCS, since the key feature of a DCVS is that there is no "the" repository: all repositories are equal, or at least potentially equal (with everything depending on how you use them).
Anyway, being a "D"-style "VCS", everyone using some Git repository will have, or can have, every version. To make this work, a Git repository itself consists, at its heart, of two big databases:
One big database holds commits and other Git support objects. Each commit holds a version of the project: a snapshot of every file, plus some metadata holding information such as the name and email address of the person who made the snapshot. Git stores all this stuff as objects in the objects database.
There's an annoying limitation with this database. Every object in it is found by some big, ugly, random-looking hash ID. Hash IDs are very difficult for humans to use. We are just plain bad at hash IDs. But Git needs the hash IDs to access the objects in the database. So...
A second database—usually much smaller—holds names, such as branch and tag names, and maps each name to one hash ID. Git cleverly arranges things so that if a branch name holds the latest hash ID, Git can use that to find every commit "in" or "on" that branch. (Git's use of branches here is weird, compared to the way branches work in most other VCSes, but we don't have to address that here, and won't.)
The objects database is read-only: once you create some object in it, you can't change that object (at all, ever). The names database is read/write: you can add and remove names at will and you (or more normally, Git) can stuff whatever hash ID is appropriate into whatever name is appropriate, at any time. The normal progression is to add a new commit to the big database, and then update one branch name in the little database to remember the latest commit. The latest commit itself remembers what was the latest commit's hash ID, which is how Git can then find earlier commits.
To make this useful on your computer—e.g., on a laptop where you'll build and test your change—you also need a work area. A normal Git repository provides one work area, which Git calls the working tree. A hosted Git repository—as on GitHub for instance—doesn't; Git calls such a repository a bare repository. Because there's no work area, it's literally impossible to do any work directly on these hosted systems.2 You won't normally deal with bare repositories, except on these hosted systems.
2GitHub, at least, have since added some fancy methods to allow limited amounts of work directly on GitHub. Because they start with a bare repository (where you can't do work), these systems are kind of weird and klunky, with peculiar limitations. They "do work" by making temporary arrangements to get work done, doing the work, committing, and then tearing down the temporary arrangements, and this gives rise to all the weird limitations. You should not try to get all your everyday work done this way: it's only good for certain limited uses.
Clones
Now, suppose someone else—Fred, or Lakeesha, or whoever—has an existing repository, and you want to do something with the software. You don't need to mess with his or her copy of everything! You just clone the repository. He or she gives you read-only access, and you point your Git to their system and say copy everything, using git clone
.
When you use git clone
, what you get is a new Git repository: a pair of databases, plus a work area. The databases are initially totally empty, but the cloning process copies every commit from the other Git, to populate your commits-and-other-objects database.
The slightly peculiar thing about git clone
is that it doesn't copy all their names. Instead, it copies their tag names as-is (usually), but take each of their branch names and changes them. What were their branch names—branches main
and develop
, say—become your remote-tracking names, origin/main
and origin/develop
. Basically your Git shoves origin/
in front of each of their names.
The reason for this change-of-names is so that you can have your branch names, and you can use the same names they're using, but remember different "latest commit" hash IDs. Git itself needs, and will use, the hash ID. You'll want your Git software, working in your repository, to remember your new commit, once you make one. They want their Git software, working in their repository, to remember their latest commit.
Giving you your own branch names, and letting them have their own branch names that can be different from yours and/or hold different hash IDs, makes this work. So your clone initially has no branches at all: your Git copies their branches to your remote-tracking names. (Remember, Git itself doesn't actually need branch names. Git only needs the hash IDs. The names are mostly just for humans.)
Once you have this clone, though, your Git will do one last step: it will create a branch name. You get to choose which name to create, but most people running git clone
don't bother. In this case, your Git (your software working with your repository) asks their Git (their software working with their repository): What branch name do you recommend? Git calls that the default branch, and on GitHub, this is usually main
now, although in the past it was usually master
(and many old repositories cling to the old traditions).
Now that your Git has a branch name, your Git finds which commit their branch name means, and creates your branch of the same name so that their latest commit is also your latest commit. Then your Git checks out this branch, meaning it chooses this "latest commit" as the commit you'll be working with / on. Your Git populates your working tree from this particular commit, and now you have files you can work with / on.
GitHub "fork"
Suppose someone Out There On The Net has a repository that you like, that you want to use and maybe even contribute to. If this repository exists on GitHub, GitHub give you a convenient way to do this.3 You navigate, in a browser, to the repository, and then use a button labeled FORK to make a GitHub-side clone that's owned by you. Because this is a clone, you need only read permission to make it.
What's special about this GitHub-side clone is this:
- GitHub do it in a way that's extremely low-cost for them (by avoiding almost all actual copying).
- GitHub make a connection from your clone back to the repository you forked. This enables easy pull requests (the topic I'm addressing, even if it doesn't seem like it