Git add interactive: adding to more than one commit-CodePudding

Let's imagine a have a lot of, not staged, changes which I want to split to N commits. If I will do just:

git add -p

I will need to go through all the changes N-1 times, but would be much more effective to do it in one go, for example, instead of pressing y putting something like y to <commit name or number>. As I can see, git can't have more than one stage, but perhaps there is some workaround.

CodePudding user response：

As I can see, git can't have more than one stage ...

This isn't strictly true, but building what you need is nontrivial.

Remember that each commit is a full snapshot of every file. What goes into the staging area, that git add -p updates, is a full copy of the file in the form you plan to commit it.

Let's take a simple example. Suppose you've made four changes to some file foo.py: one is a block of changes (one diff hunk) around line 100, one a block of changes around line 200, and the last two around lines 300 and 400 respectively. Conveniently, what you would like to have committed, once you're all done, is foo.py with both blocks 1 and 3 changed but blocks 2 and 4 held off, then—as a second commit—foo.py with all four blocks changed.

Without describing how, let's now suppose that we can make two staging areas S1 and S2. S1 is the regular staging area; S2 is our proposed extra one. What's in S1 right now is the foo.py copy that is in the current commit.

When you run git fancy-breakup-add it starts by showing you hunk #1. You say "yes, I'd like to stage that in S1". Here's what Git actually does now, given that there is only S1, is this:

extract the file
apply the diff hunk
store the result back

What's in S1 now is foo.py-plus-hunk-1.

Next, you get to see hunk #2. You do want to stage this but not in S1. So we'll stage it into S2. What should foo.py in S2 look like? Let's come back to this in a moment, and move on to hunk #3.

You do want to apply #3 to S1, so what your new command does—which is what git add -p does now—is:

extract the S1 copy of the file (which has hunk #1 added)
apply the diff hunk (adding #3)
store the result back

Note that what's in S1 now is ready to be committed. Of course your fancy program now shows you hunk #4, which you'd like to put into S2 instead; again, we'll come back to this.

That's the full set, so we're now ready to build the next commit from S1—the standard staging area—with a regular old git commit. We run git commit and Git makes a new commit with foo.py with the two changes in it. That's what's in the snapshot.

We're now ready to make use of S2, maybe. What needs to be in foo.py in S2? The answer is clear: we need the foo.py in S2 to have all four diff hunks applied. That way we retain the changes we put into S1. If we didn't apply the S1 changes, we'd revert the commit we just made!

So, the mechanism you need is straightforward, but it's not trivial. Every time you stage some diff, you must add that to every proposed commit in every staging area. Then, having finished the entire set of changes, you will commit each staging area one at a time: first, you commit the one with the fewest added changes, then the second one with both first and second group of changes, then the third one with three groups of changes, and so on.

You probably won't know in advance how many sets of stages you need, nor the order of those stages. If that's the case, you'll need some sort of fancy way to construct this information.

The git add -p command lets you edit a diff hunk and apply only part of it. If you do this, the edited hunk doesn't apply the entire file changes, so you'll need to come up with the remaining changes as a new diff that can be edited further and partially applied further. You'll need to define a mechanism for this (or just say "too hard, not an option yet, to be done in the future" perhaps).

With all that in mind, let's look at the mechanism Git provides for "extra" index files, and why you should not use it. (Instead, you might want to collect patches and use Git's merge engine.) Each work-tree—remember that there may be more than one due to git worktree add—has a primary index. The primary index for the main worktree is in git rev-parse --git-dir, which we'll call $GIT_DIR, and is named index.

What's actually in the index is a set of data giving:

a file's name (the full path name, e.g., path/to/file.ext);
a staging number, which should always be zero: any nonzero stages mean there is a merge conflict that has not yet been resolved and this must be taken care of before proceeding;
a blob hash ID with the proposed-next-commit content;
a mode (100644, i.e., not-executable, or 100755 meaning executable); and
a bunch of cache data that you don't need to worry about.

You can extract all of this with git ls-files --stage and you can update individual entries with git update-index. A quick way to determine whether there are any merge conflicts, which numerous old Git scripts used back when many Git commands were simple scripts, is to invoke git write-tree: if this fails, it tends to mean that there are conflicts. (The other possible failures are things like screwed-up .git permissions or disk-full errors, all of which are not something you want to handle either, so any failure here means "stop".)

You can copy this standard index to another file. This other file should generally live in $GIT_DIR (though /tmp often works as well) and must have a unique name; the mktemp command can generate these. The file must either not exist, or contain a valid index signature, so you can either remove it after using mktemp to create it (though this risks, very slightly, a race with other mktemp invocations) or copy the index for this working tree over top of the empty file that mktemp made.

You can set the environment variable GIT_INDEX_FILE to make Git commands use your own index instead of the index, e.g., GIT_INDEX_FILE=$tmpindex git add ... or GIT_INDEX_FILE=$tmpindex git update-index ....

There are three problems here:

The index—the distinguished index for this working tree—might not be $GIT_DIR/index, if this is an added working tree. There's no easy way to discover the correct path for the index.
Managing lots of temporary index files like this is a pain.
Updates you make, by creating a new blob and storing this into a temporary index, risk having the blob itself ripped out by Git's garbage collector, git gc. You get 14 days to complete your work by default (and nobody changes these defaults).

The upshot of all of this is that it's probably possible to use multiple staging areas (index files) to do what you want, but it may well not be the right overall algorithm.

A long aside (alongside? not actually that long)

My general rule / method for splitting a commit into N smaller ones (where N is larger than 2) is:

Make the big commit. (Maybe make a branch or tag name first, to make the next part easier.)
Back up one and make a branch here (or check out the branch made above). Use git cherry-pick -n to copy the big commit but not commit it. Then remove everything I don't want and re-arrange as needed, and commit.
Repeat as desired. The cherry-pick -n invokes the merge engine so there's as little hard work required as possible.
When done, this new branch is the branch with the N commits.

When N is just 2, git add -p, commit, git add, and commit. What you didn't add the first time goes in the second commit, whose snapshot matches the current file.

CodePudding user response：

Try starting git gui : it opens a basic but functionnal GUI to view and edit the content to be committed.

It has a much better support for staging only a subset of the current modifications : you can choose to add a complete chunk ([right click] > "Stage Hunk") or just the lines you selected ([right click] > "Stage Line(s)").

If you have quite a number of changes and intend to create a number of separate commits (say: 3 or more), I would recommend another workflow :

commit all your work in one single commit, and tag it :

git add -u 
# if you have to add new files, also 'git add' them here
git commit -m "all in one"
git tag wip/allinone

return to the previous commit

git reset --hard HEAD^

then iterate through the following steps :

restore the content of that commit on disk :

# restore the content of 'wip/allinone' in the worktree:
git restore -s wip/allinone -W .

edit the files (vscode, diff viewer, git gui ...) to keep only the parts you want to commit, and discard the other modifications entirely,
validate (now that you have only the changes you are about to commit, you can compile, run unit tests ... without the "other changes" interfering)
commit
return to 1.

until you have committed all your changes.

You may run git tag -d wip/allinone afterwards.