Home > Net >  How to resolve Git merge conflict when checking in new code into an empty Bitbucket repository
How to resolve Git merge conflict when checking in new code into an empty Bitbucket repository

Time:03-01

Before I post my question, I want to mention that I asked this question in another Stack Exchange site and was told this question needed to be asked in Stack Overflow.

I recently created a new repository in Bitbucket where I intended to check in a new project I have been working on for some time. When I created the new project in Bitbucket, I selected the option to include a .gitignore file. When I tried to push my new project, it resulted in a conflict with this file that I have not been able to resolve. Currently, my code is stuck in my local repository.


What I have tried

  1. I tried to mark the .gitignore as merged in Eclipse. I got this error: Cannot pull into a repository with state: MERGING. I then executed git merge --abort as suggested in this Stack Overflow answer.
error: Entry '.gitignore' not uptodate. Cannot merge.
fatal: Could not reset index file to revision 'HEAD'.
  1. After the above error, I tried a hard reset (from Eclipse). Then I tried the abort again (from item 1 above. That resulted in the following error:
fatal: There is no merge to abort (MERGE_HEAD missing).
  1. Then, I tried to do a pull from Git Bash which resulted in:
$ git pull
error: Pulling is not possible because you have unmerged files.
hint: Fix them up in the work tree, and then use 'git add/rm <file>'
hint: as appropriate to mark resolution and make a commit.
fatal: Exiting because of an unresolved conflict.
  1. Tried to remove .gitignore by running git rm .gitignore and then call git pull. I got the following error:
fatal: refusing to merge unrelated histories

As you can see, I tried to fix from Git Bash and from Eclipse, but I have not been able to make any progress. I don't care losing the information on the .gitignore file. I can always recreate the file. I just need to resolve this conflict by any means necessary so that I can push the stuff in my local repo to the remote one. I am at a loss.

CodePudding user response:

TL;DR

If you just want to override the .gitignore on Bitbucket, consider using git push --force to discard the initial Bitbucket commit entirely.

If you want to keep that file, grab it out of that commit:

git show origin/master:.gitignore > ignore.bitbucket

for instance, then incorporate the file however you like, and then use a force-push to discard the (single) Bitbucket commit.

Long

Here's the root of the problem:

When I created the new project in Bitbucket, I selected the option to include a .gitignore file.

Git is not about files, and hence does not store files—at least not in the sense you're probably thinking of. What Git is about, and what Git therefore stores, is commits.1 Commits—but not branch names—can be related via ancestry: in the same way that most humans have two parents, most commits have a single parent. So some commit can be the great-great-great-grand-child of some other commit, for instance, or two commits might be siblings (both have the same parent), or other similar relationships.

(These relationships form a Directed Acyclic Graph or DAG. The commit graph in particular allows Git to find common ancestors of two branch-tip commits.)

The commits themselves, which we have Git find by their hash IDs, each hold two things:

  • Each commit holds a full snapshot of every file, in a compressed and read-only format with the contents de-duplicated (across all commits). These snapshots are like tar or rar or winzip archives; they need to be extracted before you can actually use the files (and then the extracted files are not in Git at that point, though there is obviously a copy that is in Git: it's just stored in this compressed and de-duplicated Git-ified format).

  • Each commit also stores some metadata, or information about the commit. The metadata includes the name and email address of the person who made the commit, for instance, along with a date-and-time-stamp showing when they made the commit. In this metadata, Git adds, for its own commit-graph purposes, the raw hash IDs of a list of previous commits—usually just one, so that Git has to walk backwards one commit at a time, from the latest commit to the second-latest, to the third latest, and so on.

Users, of course, want their files. The files are in the commits, in their stored-for-all-time archives. We must therefore pick some commit and have Git extract that commit, so that we can see and work with some files.

If you have Bitbucket (or any other web hosting site) make a new, totally-empty repository, it will have no commits in it yet. That's fine, but if and when you clone this empty repository, you will get a warning: Git will say that there was nothing for it to check out (which is true!). This means if you make your own new, totally-empty repository, your new empty repository has the same lack-of-commits as your Bitbucket repository: the two are completely identical, in that both have nothing at all.

You can then make your own first commit, in your own repository, on your laptop or wherever it is stored. Being the first commit ever, this commit will have no parent commit. It can't have a parent, as there is no earlier commit, so it doesn't have a parent, and everything is fine. This commit is slightly special though, and Git will tell you that you have made a—or "the"—root commit:

$ git commit -m initial
[master (root-commit) c5f8984] initial
 1 file changed, 1 insertion( )
 create mode 100644 README.txt

But you didn't do that: you had Bitbucket create a repository and then create its own root commit in that repository, so that their repository is not empty. Their first commit contains a .gitignore file (and perhaps nothing else, or perhaps a README and/or LICENSE file; those are the typical GitHub options).

If you go on and make a root commit in your own initially-empty new repository, these two root commits are not related. They are not parent-and-child. They are not siblings, with a common parent. They are completely unrelated.

I use the terms your Git and their Git as shorthand here to refer to your Git software working with your repository and their Git software working with their repository. When you cross-connect two Gits like this, one of them is a sender and one is a receiver, and the sender will send some or all of his commits to the receiver, if the receiver doesn't have them yet. That's how we synchronize two repositories, especially when one was a clone of the other at some point.

Cloning copies all of the commits (and none of the branches, sort of), so a clone made at some point in some repository's lifetime will start out with the same commits as the original, and any added commits will generally have some sort of ancestry-relationship. But again, this doesn't quite work with an empty repository, since there's no initial commit.

So, what happened in this case is that after you set up the remote, you had your Git call up their Git and get, from them, their root commit that contains a .gitignore file. Meanwhile you had your own root commit that contains a .gitignore file.

It's not clear to me exactly what happened in Eclipse (Eclipse has its own Java-based Git implementation that doesn't quite do the same things as the C based implementation you get with command-line Git from bash or other shells). However, it left you in a weird state, which you fixed up, sort of, with your git rm command. At this point git pull from bash ran:

  1. git fetch: any git pull always runs this first. This git fetch had nothing to do though, as you'd already obtained the Bitbucket root commit.
  2. git merge: the git pull command means—is shorthand for—run git fetch, then run a second Git command and the default second Git command is git merge.

For git merge to work, it needs to find the best common ancestor between two tip commits. To describe this properly we need some annotations.


1More precisely, a Git repository consists of two databases, one holding commits and other internal Git objects, and one holding names. The two databases are simple key-value stores, with the objects database keyed by hash ID, allowing Git to retrieve the raw object contents by hash ID, and the names database pairing individual refs—branch names, tag names, and other such names—with a single hash ID.


Drawing branches in Git

As I mentioned above, each commit is a two-part entity, holding metadata (information about the commit) and data (the snapshot archive, in the Git-ified format). Each commit has a unique2 hash ID. To draw this, in a format suitable for human comprehension, we need to drop a lot of irrelevant detail, keeping just two things:

  • a name for each commit, and
  • the arrows coming out of the commit that form the arcs in the directed graph.

The result very often looks something like this:

... <-F <-G <-H

where H, on the right, stands in for our latest commit's hash ID. Commit H has metadata that include the raw hash ID of an earlier commit, which we'll call G. Commit G has metadata with the hash ID of still-earlier commit F, and so on. By holding each previous-commit hash ID, the commit can be said to "point to" the earlier commit.

To quickly find any particular commit—in our case, our latest commit H—Git uses a branch name like master or main, or a tag name like v1.2, or some other name like origin/master. This name contains the raw hash ID of the actual commit, so it too can be said to "point to" a commit:

...--G--H   <-- main

What makes a branch name special in Git is that you can check out or switch to some branch so as to extract the files from its latest commit. Running git switch main here tells Git that you'd like to have, to see and work on/with, the contents of the files as they exist saved forever in commit H. Git therefore extracts those files to a work area—your working tree—and, as soon as it has done that, remembers which branch name you asked for. To remember the branch name, Git attaches the special name HEAD, which I draw like this:

...--G--H   <-- main (HEAD)

You can create new branch names at any time. Each name must point to exactly one commit. If you were to pick commit G, for instance, to create the new name br1, we would draw it this way:

...--G   <-- br1
      \
       H   <-- main (HEAD)

We're still "on" branch main, and still using the files from commit H, but now there's a direct way to find the hash ID of earlier commit G, rather than, e.g., running git log and finding the hash ID of G manually. But most often, when we create a new branch name, we make it point to the current commit:

...--G--H   <-- br1, main (HEAD)

We can now switch to that branch with git switch br1. This attaches our HEAD to the name br1, and extracts the files from H—except we already have all the files from H, so Git doesn't bother doing anything at all, other than shuffling the attachment of HEAD here:

...--G--H   <-- br1 (HEAD), main

If we now create a new commit, the new commit gets a new, unique hash ID. We will call it "commit I" for simplicity. New commit I will point back to the commit we were using when we made new commit I, i.e., commit H, and once Git has saved away all the files and metadata for new commit I, Git will write I's hash ID into the name to which HEAD is attached:

          I   <-- br1 (HEAD)
         /
...--G--H   <-- main

If we make another new commit J, J will point backwards to I, and Git will write J's hash ID into br1:

          I--J   <-- br1 (HEAD)
         /
...--G--H   <-- main

If we now switch back to main, Git will remove, from our work area, the files that go with J, and put in the files that go with H instead, and leave us with this:

          I--J   <-- br1
         /
...--G--H   <-- main (HEAD)

We can now create and switch to a new branch br2:

          I--J   <-- br1
         /
...--G--H   <-- br2 (HEAD), main

and as we make more commits, now br2 will get updated:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2 (HEAD)

That, in a nutshell, is how Git branches really work: we add commits to them and they grow, and the branch name always means the last commit on the branch.

Git allows us to move, or even delete, any branch name at any time (although you're not allowed to delete the one HEAD is attached-to; you have to switch away from it first). If we move the name br2 back one step to commit K, we get:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K   <-- br2 (HEAD)
           \
            L   ???

We won't look at how we do this, here, we'll just note that commit L still exists—but now you can't find it, unless you memorized its random-looking hash ID.3


2The unique hash ID is the real key to making Git work. The hash ID is unique across every commit in every repository, so that if two Gits meet, and compare hash IDs, they have the same hash ID if and only if they have the same commit, which they must have gotten by fetching or pushing (or during the initial clone, which is mostly one big fetch).

3Git has ways to get commit L back again, but if you leave it unreachable like this for more than a month or so, Git may get around to deciding that you don't care about it, and remove it for real. So commits are not always forever, but removing them is tricky: you arrange for Git not to see them—to have no names that let you and Git find them—and then eventually they will go away, unless they got sent to some other Git that decides to hang on to them.

Because Git is very greedy for new commits, once you've sent a commit somewhere else, it is very hard to get rid of for real. If you think of commits like viruses for which there is no vaccine, you're not far off.

  • Related