TL;DR I configured a difftool and git-diff
gives "intelligent" diffs but git-add
creates "stupid" hunks. Why?
I configured the difftool to use nbdime with nbdime config-git --enable --global
which I think essentially just adds these lines to my .gitconfig:
[diff "jupyternotebook"]
command = git-nbdiffdriver diff
[merge "jupyternotebook"]
driver = git-nbmergedriver merge %O %A %B %L %P
name = jupyter notebook merge driver
[difftool "nbdime"]
cmd = git-nbdifftool diff \"$LOCAL\" \"$REMOTE\" \"$BASE\"
[difftool]
prompt = false
[mergetool "nbdime"]
cmd = git-nbmergetool merge \"$BASE\" \"$LOCAL\" \"$REMOTE\" \"$MERGED\"
[mergetool]
prompt = false
Now git diff
gives the good output I expect:
nbdiff /var/folders/6b/03yw1pts2nx_q8vftrh6fv140000gp/T//FILE.ipynb FOLDER/FILE.ipynb
--- /var/folders/6b/03yw1pts2nx_q8vftrh6fv140000gp/T//FILE.ipynb 2022-05-17 14:29:39.937318
FOLDER/FILE.ipynb 2022-05-17 14:09:45.222229
## inserted before /cells/0:
code cell:
source:
...
markdown cell:
source:
...
## deleted /cells/0:
- markdown cell:
- source:
- ...
## inserted before /cells/2:
code cell:
source:
...
But if I do git add -e FOLDER/FILE.ipynb
, it gives me a "really bad" diff:
diff --git a/FOLDER/FILE.ipynb b/FOLDER/FILE.ipynb
index 3a1540c..17363f8 100644
--- a/FOLDER/FILE.ipynb
b/FOLDER/FILE.ipynb
@@ -1,621 1,716 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- ...
- ]
- },
- ... almost every line in the file is removed
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "j1qKT6qtAYEj"
},
"outputs": [],
"source": [
...
]
},
... almost every line in the file is added back
I may have a fundamental misunderstanding of what git-add does, but why isn't git add
using the nbdime diff tool? And is there a way I can add just the changes that I see in git-diff
?
CodePudding user response:
Both git add -e
and git add -p
need to be able to understand an edited diff. They have a limited amount of comprehension of diffs in general, and require the "dumb" format from plain git diff
. The nbdime
tools take the original files apart, re-shuffle them into usable text, and diff that usable text,1 but that's not what's actually in the files, and git add -e
needs to work on what's in the files, not some cleaned-up presentation thereof.
1What's in the files is machine-readable JSON. The result of the nbdime
tools appears to be yaml. If Git had a native JSON diff engine, git add -p
and company would be able to deal with the result, but Git doesn't, so it isn't. If Jupyter-notebooks used yaml, Git's line-oriented tools would be able to deal with them, but Jupyter-notebooks doesn't, so it isn't.