Home > OS >  Why git doesn't show revert and reverted commit?
Why git doesn't show revert and reverted commit?

Time:11-24

As show below, Though I have the revert commit and reverted commit from git log, when I git log the file itself, it cannot show me these two commits

lchen@sh-lchen ~/p/k/v5.15 ((v5.15))> git log --stat --grep='6206b798'
commit 91bed5565bba03b2a9f7334b58ae4be9df7c3840
Author: Jia He <[email protected]>
Date:   Tue Jul 20 21:26:55 2021  0800

    Revert "qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()"

    This reverts commit 6206b7981a36476f4695d661ae139f7db36a802d.

    That patch added additional spin_{un}lock_bh(), which was harmless
    but pointless. The orginal code path has guaranteed the pair of
    spin_{un}lock_bh().

    We'd better revert it before we find the exact root cause of the
    bug_on mentioned in that patch.

    Fixes: 6206b7981a36 ("qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()")
    Cc: David S. Miller <[email protected]>
    Cc: Prabhakar Kushwaha <[email protected]>
    Signed-off-by: Jia He <[email protected]>
    Signed-off-by: David S. Miller <[email protected]>

 drivers/net/ethernet/qlogic/qed/qed_mcp.c | 23       -----------------
 1 file changed, 6 insertions( ), 17 deletions(-)
lchen@sh-lchen ~/p/k/v5.15 ((v5.15))>
lchen@sh-lchen ~/p/k/v5.15 ((v5.15))>
lchen@sh-lchen ~/p/k/v5.15 ((v5.15))> git log --oneline drivers/net/ethernet/qlogic/qed/qed_mcp.c | grep "fix possible unpaired"
lchen@sh-lchen ~/p/k/v5.15 ((v5.15)) [0|1]>

lchen@sh-lchen ~/p/k/v5.15 ((v5.15))> git branch HEAD --contains 6206b7981a36476f4695d661ae139f7db36a802d
* (no branch)

Here is my gitconfig excluding user name, mail and HTTP proxy:

[core]
        editor = vim
        whitespace = fix,-indent-with-non-tab,trailing-space,cr-at-eol
        excludesfile = ~/.gitignore
    filemode = false
[am]
    threeWay = true
[core]
    autocrlf = input
    eol = lf
    whitespace = cr-at-eol
[auto]
    crlf = false
[pull]
    rebase = true

The repo is Linux mainline kernel and branch:

lchen@sh-lchen ~/p/k/v5.15 ((v5.15))> git describe HEAD --all
tags/v5.15
lchen@sh-lchen ~/p/k/v5.15 ((v5.15))> git rev-parse HEAD
8bb7eca972ad531c9b149c0a51ab43a417385813

git version:

lchen@sh-lchen ~/p/k/v5.15 ((v5.15))> git --version
git version 2.33.1

Why git log --oneline drivers/net/ethernet/qlogic/qed/qed_mcp.c cannot show the revert and reverted commits?

CodePudding user response:

You omitted a few important items, but with the (relatively obvious) clue that the repository in question is that of a Linux kernel, I was able to reproduce it and can diagnose the problem.

The two commits in question are (as you did show):

91bed5565bba03b2a9f7334b58ae4be9df7c3840
6206b7981a36476f4695d661ae139f7db36a802d

These are both reachable from tag v5.15, so if we clone a Linux kernel and check out v5.15 and run git log we will encounter them:

$ git switch v5.15
[output snipped]
$ git log
[output snipped]
/91bed5565bba03b2a9f7334b58ae4be9df7c3840
[the commit shows up - same goes for the other]

These commits are, however, buried deep behind multiple merges. Running git log -- drivers/net/ethernet/qlogic/qed/qed_mcp.c and searching shows that these commits don't turn up, whether with --oneline or not, but:

$ git log --oneline -- drivers/net/ethernet/qlogic/qed/qed_mcp.c | grep 91bed5565
$ git log --oneline --full-history -- drivers/net/ethernet/qlogic/qed/qed_mcp.c | grep 91bed5565
91bed5565bba Revert "qed: fix possible unpaired spin_{un}lock_bh in _qed_mcp_cmd_and_union()"

The interesting question is therefore: Why does adding --full-history make the commit show up? The answer is in the documentation, under the History Simplification section. But this may deserve more than just a reference to the git log manual page.

git log

Remember that the job of git log is:

  • to visit some selected subset of the history in the repository; then
  • to show some selected subset of the above subset.

We need to define a few terms too. The first thing we should define is history.

To a first approximation, the history in a repository is the set of commits in the repository. This leaves out a few important subtleties though:

  • Each commit stores a snapshot of all of its files: a more space-efficient version of a set of tarballs or other archives, more or less.
  • Each commit also stores some metadata. The metadata is information about the commit itself, including things like the name and email address of the commit's author.
  • Each commit is a Git object, and these objects are numbered (with hash IDs such as 91bed5565bba03b2a9f7334b58ae4be9df7c3840).

In the metadata for any one commit, you'll find a list of hash IDs for other commits. We call these the parent or parents of the commit. These other commits must actually exist,1 which means that commits always "point backwards" to their parents, never forwards to any future children.

As a consequence, it's possible to take the set of commits in a Git repository and form them into a Directed Acyclic Graph or DAG. This means we've also defined a way to walk the graph, i.e., visit the history: we pick some start point commit(s) and visit them, then visit their parents, then visit the parents' parents, and so on. Some parents may be already-visited in this particular process, so we have to define whether we're going to use depth-first, breadth-first, or whatever, and make sure we don't visit commits more than once, but that's the gist of what it means to visit commits: we walk through the graph induced via the parent linkage from commits to earlier commits.

This defines two of our terms: we visit (walk through the graph) some selected subset of the history (commits in the repository). We'll then show some subset of the subset, with the word show defined later. There's something important we left out: what are these subsets?


1Git doesn't exactly check this rule all the time, but it grows out of rules that Git does check. Sparse and partial clones break the checking, but are not allowed to break the rule, even though Git can't check past the point at which the checking has been deliberately severed.


Subsetting

The subsetting operations in git log are complicated. There are the obvious ones: we don't necessarily start the walk at the most recent commit ever, for instance. Instead, git log takes an argument like HEAD or a branch or tag name to indicate where to start. If we don't give it any of these, it uses HEAD; that was the case here, so it started from HEAD, which is the commit found via tag v5.15.

Then, we can use -n 11 for instance to make git log visit 11 commits and then quit. The number limit is straightforward: if we hit the limit, git log exits the "visit more commits" loop early (as soon as the actually-displayed-commits count goes to zero).

If we don't use a path name and other various options—that is, if we run git log instead of git log -- drivers/net/ethernet/qlogic/qed/qed_mcp.c—these are the only real limits we've placed on the subsetting, so we'll visit and show all commits reachable from HEAD, and hence find the two of interest. But if we add a path name, Git needs to make sure not to show commits that are not "interesting".

Perhaps unfortunately, listing a path name turns on two separate subsetting operations. One is in the showing part, which we'll come back to in a moment. The other is in the visiting part.

When Git does the visiting, Git does this with a priority queue. The priority queue starts out holding the commit hash IDs from your git log command: if you ran git log a123456 master v5.15, Git will resolve each of these three things into hash IDs, and place the resulting hash IDs into the queue.2 If you run git log without any specified starting points, Git resolves HEAD to a hash ID and places that (single) hash ID in the queue.

Git then begins its visit loop, whose structure is more or less as follows:

  • while the queue is not empty
    • take the highest priority commit off the queue
    • show it or don't show it based on subsetting (and maybe exit the loop)
    • insert some or all of its unvisited parents on the queue

That last step is a little weird: some or all? Why only some? Well, that's our first subset operation. And that is where history simplification comes in.

If you use the --first-parent option, the subset operation here is to put only the first parent of any merge commit into the queue. (I'm not sure if / how this interacts with the next option.)

If you use a path name, the subset operation here depends on the history simplification mode. With --full-history, Git puts all parents on the queue, but by default, Git picks any of the parents in which the stripped-down tree is the same. This last phrase is potentially quite confusing, as I've introduced the idea of stripping down a tree and not explained why "the same" is important. To see what these are really about, consult the linked documentation and search for TREESAME.

In our particular case, though, what matters is the one file, drivers/net/ethernet/qlogic/qed/qed_mcp.c, as seen in the merge commit's snapshot and in each of the parent commits' snapshots. Git will pick a parent in which drivers/net/ethernet/qlogic/qed/qed_mcp.c is not modified. If it's not modified in both parents, Git will pick one parent at apparent-random (although in practice it always picks the first parent, I think).

So, if some set of commits are behind an ordinary two-parent merge commit, Git may not visit one parent at all, and pursue only the other parent. This happens in this case: drivers/net/ethernet/qlogic/qed/qed_mcp.c is not modified by some merge, so Git follows one of the parents. The two commits you're looking for—the earlier one that touches drivers/net/ethernet/qlogic/qed/qed_mcp.c in 6206b798a3... and the 91bed5565bb... commit that puts it back—are behind one of these pruned chains. So we never even have a chance to visit the two commits, much less decide whether to display them.

If we do visit some commit, we then choose whether or not to display that commit. Here, again, the TREESAME test comes into effect: if the child and parent commits contain different versions of drivers/net/ethernet/qlogic/qed/qed_mcp.c, the "show the commit" part will actually show something (and decrement the -n count, if there is one). If not, it will show nothing at all. So that's the second subset operation. (And show just means what git log prints.)


2Caveat: if there are negated items (git log --not origin/master or git log ^origin/master, for instance), any commit that's in the negated range doesn't go into the queue in the first place. Hence git log HEAD --not HEAD shows nothing. This applies to the parent insert later as well.


Conclusion: why does adding --full-history make the commit show up?

In this case, the second subset operation keeps us from seeing mainline commits that don't touch drivers/net/ethernet/qlogic/qed/qed_mcp.c. We do want this one! But the first subset operation—the History Simplification part—removes entire commit chains from the graph walk by never entering some of merge-parent-commit hash IDs into the priority queue. This keeps us from seeing the two commits.

The --full-history option tells Git not to do the first subset operation at all. By skipping it, we make the second one work harder, but see the commits we want to see.

CodePudding user response:

You essentially have this situation:

    o--Q--o--R--o        side branch (2nd parent)
   /             \
--o--o--o--o--o---M--o   "main" branch

where Q makes a change to a file and R reverts the change, and no other commit makes a change to the same file.

Therefore, at the merge commit M, the contents of the file are identical in both parents of M. In such a situation, when the git log is invoked with a path limitation, history simplification picks one of the parents, usually the first one, and follows only that branch. Git essentially ignores the side branch because it does not contribute anything that is necessary to explain the state of the file at M. (After all, in total, the side branch did not make any change to the file.)

  • Related