Home > Software engineering >  Set mergetool to only open certain extensions and use local files for others
Set mergetool to only open certain extensions and use local files for others

Time:01-03

I'm trying to merge two branches that have .py and .png files. Is there a way to set all the .png files to be local and only open with mergetool the .py files? I already did git checkout --ours / --theirs to choose the correct .png files to conserve but every time I open the mergetool these files keep popping up. What am I missing here?

CodePudding user response:

The git mergetool command runs over the unresolved files by default, so the simplest thing is to mark the .png files as "correctly resolved" first. See the note below, in the longer DETAILS section, for more about this.

That said, you can also supply pathnames to git mergetool, so you can list precisely those unresolved .py files. That's just a small matter of shell scripting:

git ls-files --unmerged

produces the list of such files (with, alas, all the --stage data as well), which you can then filter using, e.g., awk:

git ls-files --unmerged | awk '$4 ~ /\.py$/ { print $4 } ' | uniq

(it's definitely possible to put the uniq into the awk code, but simpler just to run uniq). Verify that this produces the right list of files; it's then trivial to have the shell expand this into place:

git mergetool $(git ls-files --unmerged | awk '$4 ~ /\.py$/ { print $4 } ' | uniq)

Details

When you run git merge—or indeed, any of the various commands that use Git's merge engine, including git cherry-pick for instance—and there is a conflict, Git leaves extra information in what Git calls the index or staging area. These are two names for the same thing, and there's a third name, now largely obsolete, but it still shows up in flags: the cache, as in git rm --cached or git diff --cached. (Some commands, including git diff, accept --staged as a synonym, but git rm still doesn't even as of Git 2.39.)

Normally—as in, when not in the middle of a merge conflict—Git's index aka staging area contains a full copy of every file in the form it will have if you run `git commit right now.1 But merging involves reading three versions of each file:

  • the merge base version is the one each "side" started with;
  • the --ours version is the one "we" had in "our" commit when we started our git merge; and
  • the --theirs version is the one "they" had in "their" commit when we started our git merge.2

Git starts by looking at all three of these versions. If all three match, there's no problem, and the merge result is any of the three versions. If two match and one is different, there's again no problem. The usual problem comes in when all three are different. (There can also be some other problems, e.g., when both "sides" created a new but different file with the same name, but we'll ignore these cases for simplicity.)

When all three files are different, Git really does have to merge some work. For plain-text files, Git does a line-by-line diff of the base version against each of the branch-tip versions. This results in two separate diffs. Git then tries to combine the two diffs, applying both sets of changes to the base version.

If all goes well—if Git believes it successfully combined "our changes" with "their changes", in other words—Git then treats the resulting merged file as the correct merged result, and writes that file to your working tree, so that you can see it, and to its own index, ready for committing. It's when things go less-well that you get a merge conflict.

Again, remember that Git's index / staging-area normally holds a copy of every file. So if Git is able to merge two sets of changes to, say, README.txt, Git will put the merged version of README.txt into both its index (ready to be committed) and your working tree (so that you can see what Git did). The index copy is at what Git calls "stage zero". But if the merge goes wrong, here's what Git does:

  • Git puts the base version into the index in "staging slot 1" for this file name;
  • Git puts the --ours version into the index in "slot 2" for this file name; and
  • Git puts the --theirs version into the index in "slot 3" for this file name.

The result is that the index now has three files named, e.g., README.txt or file.py or image.jpg.

If the file is text and Git attempted a merge, Git puts its best attempt at merging, plus conflict markers, into your working tree, under the file's name (README.txt or whatever, again). If the file is not text, such as image.jpg, Git leaves some version in your working tree. Git doesn't put conflict markers in because binary files don't have "lines" in the first place.

In all of these cases it is your job to resolve the conflicts. You do this by picking out the right merged result and stuffing that into Git's index at "slot zero", erasing the three conflicting versions. For a file like file.py, you might open the working tree copy in your editor, for instance, and edit it and hand-resolve the conflict. You can then write out the updated file.py and run:

git add file.py

This tells Git: Erase the three nonzero-slot entries, and copy the working-tree version of file.py into the index at slot zero. The file is now "resolved".

Git allows the merge to complete—that is, you'll run git merge --continue or perhaps simply git commit—once all files in Git's index are back at "slot zero". Until then, any file at any nonzero slot number is "unresolved". In most cases, if there's a file in "slot 1", there's a file of the same name in "slot 2" and "slot 3" as well. (The cases where this isn't true are ones we are not covering here.)

It's the git add command or—if you want to remove the file—a git rm command that resolves the file, by erasing the nonzero slots and writing the correct to-be-committed file to Git's index at slot zero. Until you git commit, you can then overwrite the slot-zero file with another (also slot-zero) file, so you can do a first pass at resolving and git add and then test, if you like: there's no need to keep files unresolved. But some people do like to keep them unresolved until the very end, so if you are one of those people, just remember how this works: whatever is in slot zero is what gets committed, and git commit won't let you commit until everything that's in Git's index is in there at slot zero.

The point of the staging area is partly to allow these nonzero slots, and partly to allow you to change your mind about what files are in slot zero. You use git add to copy things into Git's index (at slot zero, always) and they're now ready-to-be-committed, but they just sit there, being ready, until you actually run git commit. If you git add again, you replace some file(s) with newer versions; if you do this before you git commit, the versions you replaced never get committed.

With that in mind, we can now look at git mergetool, and also some special weirdness with git checkout and git restore.


1Technically, what's in the index is actually a blob hash ID: the index holds pre-compressed, pre-de-duplicated "copies" of files, which take no space if they're literally the same as any existing committed file. But you can just think of them as actual copies of files: except for using no disk space, they behave like copies of files.

2Note that for git cherry-pick, the ours and theirs names still make sense. The merge base version is the parent of the "theirs" commit. Since git rebase works by repeated cherry-picking, this also makes sense, except that git rebase starts by checking out "their" branch-tip commit. It then uses git cherry-pick on each of "our" commits. This causes the ours and theirs relationships to change. That's a matter for a separate discussion, though.


Mergetool works by reading the index

You run git mergetool after you run git merge and get merge conflicts. What git mergetool does is look for those nonzero-staging-slot entries in Git's index. Files with a nonzero staging number are unresolved.

If you run git mergetool and resolve some files, and/or use git add to resolve some files, and then interrupt the git mergetool run and start another git mergetool run, Git starts over, listing the unmerged files. If that list of files is now smaller, those are the only ones still unmerged.

Hence, if you have some *.jpg files that you can resolve, you can do that first:

git checkout --ours foo.jpg
git checkout --theirs bar.jpg
git add foo.jpg bar.jpg         # these two files are now resolved

Running git mergetool at this point will not attempt to merge bar.jpg and foo.jpg because they're not unresolved any more.

When git mergetool brings up your actual merge tool (whatever that may be), you're supposed to resolve the conflicts in that merge tool, and then exit the merge tool to tell git mergetool that it is done. The git mergetool command will then run git add for you on that file.3 That's how git mergetool can later pick up where you left off, after you interrupt it.

This brings us to some oddities with git checkout, some of which are cleaned up in git restore. The checkout and restore commands have flags:

  • --ours means get the file from slot 2.
  • --theirs means get the file from slot 3.
  • There's one missing here: there should probably be a --base to mean slot 1, but it's just not there.

These options tell git checkout and git restore to read the index copy from the given slot, and write that out to the working tree copy of that file. These options don't do anything to the index itself, so the file remains unresolved.

You can, however, also run git checkout commit -- path. This option tells git checkout to reach into the specified commit and fine the committed copy of the specified path, and git checkout does this by writing the file to slot zero of the index first. This action erases slots 1, 2, and 3. So this kind of git checkout marks the file resolved!

With git restore, you can do the same thing, but the places that git restore writes to are specified by the --worktree (or -W) and --staged (or -S) flags. So:

git restore -SW -s HEAD -- path/to/file

tells git restore to extract our (HEAD) version of the file and write that to both the index (-S) and your working tree (-W). So this too resolves the file, like git checkout HEAD -- path/to/file would. Leaving out the -S means that git restore won't mark the file resolved.

You might wonder why this is all so complicated. The answer is in part "because Git just grew over time", i.e., this wasn't actually planned, it just happened by mistake. It's also in part "because Git commands try to be flexible tools": git restore in particular is more flexible than git checkout as it can write, separately, to Git's index or your working tree or both. The old, more-confusing git checkout command writes to Git's index if extracting from a commit, and always writes to your working tree.

Last, if you want to "un-mark" a file as resolved—i.e., to return it to its unresolved state, both git checkout and git restore have an option -m for doing this. Note that it destroys any working tree work you've done, and -m means something different to git checkout if you're not in "merge mode" (which to me is yet another reason to avoid the old git checkout command, using git switch and git restore instead). Again, I won't cover any of the details here, as this is already long enough.


3Precisely when and whether git mergetool runs this git add is a little tricky because Git doesn't really know whether your merge tool has done the merge. There are several knobs you can configure to tell git mergetool how to interpret results from the merge tool. But if you're using a known merge tool, that's already configured correctly, this is all invisible. It's when you want to use your own merge tool that you have to know how to configure this. We won't cover these details here.

  • Related