Home > Software engineering >  GIT back to a specific commit id without deleting the history
GIT back to a specific commit id without deleting the history

Time:02-26

Here is my commit log and I want to switch back to a specific commit id (for example Second), When I use git checkout , it is ok but, I am no longer able to switch back to the last commit (Fourth).

HEAD points to the second commit and there is nothing after that when I log my commits.

How can I switch between my commits without deleting the history?

commit 61c71a9e5a6d9e29a4172e687172dd4b8523eb4a (HEAD -> main)
Author: mohhhe <[email protected]>
Date:   Fri Feb 25 19:08:36 2022  0330

    Fourth

commit 9c3e8919cfa2c970f14056eef34ca12b49025f65
Author: mohhhe <[email protected]>
Date:   Fri Feb 25 19:08:13 2022  0330

    Third

commit d33795596001197f382038a72d20faf0cfbe7ab7
Author: mohhhe <[email protected]>
Date:   Fri Feb 25 19:07:55 2022  0330

    Second

commit 2fe7b1d8270fcfb41d73e69293da10734e37b069
Author: mohhhe <[email protected]>
Date:   Fri Feb 25 19:07:39 2022  0330

    First

CodePudding user response:

An alleyway analogy

Imagine, for a moment, that you're at the entrance to a narrow street or alleyway in a big city, surrounded by skyscrapers. Looking down the alleyway, you can see a series of dumpsters. Now, walk halfway down the alleyway and look ahead of you. Half the dumpsters are gone! Where did they go? Nowhere: they're right behind you.

The same idea applies here: Git did not delete the commits you can't see. You just can't see them. Move back to a vantage point from which you can see them, and you'll see them again.

Reality, such as it is

In Git, a commit is a two-part entity: it holds a snapshot of all files—well, all the files that Git knew about, at the time you (or whoever) made that snapshot—and some metadata. Each commit is numbered, with a big, ugly, random-looking hash ID, like 61c71a9e5a6d9e29a4172e687172dd4b8523eb4a as shown in your output.

The hash ID is what Git needs to find the commit. The commit itself is stored as a bunch of parts, using a commit object and other internal supporting objects. The hash ID you see here is that of the commit object itself, which holds only the metadata: the snapshot is in a tree object, which has yet more sub-objects. But you don't normally need to know this; what you do need to know is that Git has a big database holding all of its objects, each of which is numbered, and that Git itself needs the number to retrieve the object.1

Humans, however, are very bad at numbers. What were those four hash IDs again? It's not worth memorizing them anyway though. Git offers you a very fast way to find one of those four hash IDs: the name main, which is easy for you to remember, finds one of those hash IDs.

Over time, the one hash ID that main finds may change, but right now, it finds 61c71a9e5a6d9e29a4172e687172dd4b8523eb4a for you, and for Git. That commit is the latest commit on the branch main. It is by definition, because the name main holds that ID. So if you want Git to find the latest commit on main, you can simply ask Git for main, and Git will look up the name main and find that ID and hence find that commit.

If and when you make a new commit, here's what Git will do (in some order or another; you don't really get to see if there's any particular order to this):

  • Make a snapshot of every file that Git knows about. To make Git see any update you've made to a file that Git already knows about, you must run git add on it. To make Git see any new file you created that did not exist until now, you must run git add on it. There's a lot more to it than this, but that's the first approximation to the reason you have to keep running git add: to tell Git that the new snapshot should use the new or updated file.

  • Gather up a bunch of metadata. The metadata Git will gather includes your name (as set in your user.name setting) and email address (from your user.email setting). It includes the current date-and-time down to the second. And, it includes the currently most-recent commit, whatever that is, on the branch you're on—in this case main.

Git writes all this out to make a new commit, which gains a new, unique, never-used-before, never-will-be-used-again, hash ID. This hash ID must never occur in any Git repository except to be used to identify this commit that you just made right now. (That's why the hash IDs are so big and ugly: so they can be unique.)

Git then stores the new commit's hash ID in the current branch name. So now the name main selects your new commit—the one you just made.


1That's because this big database is a key-value store, with the hash IDs being the keys. There's a slow method of walking the entire database and getting every <key, value> pair, but this takes many seconds, or even minutes, in a big repository: far too slow to be useful. A key lookup takes milliseconds, so that's what you want Git to be doing.


Commits thus form backwards-looking chains

What this all means is that the name main automatically and always selects the last commit in the branch named main. By definition, main is the end of the street / alleyway / superhighway / motorway / whatever it is. You add new commits by making new commits while you're on that "road", and that extends the "road" a bit further.

Another way to show this is to draw the commits using uppercase letters to stand in for the real hash IDs. Here, we have your original four commits, which we'll call A, B, C, and D for short:

A <-B <-C <-D   <--main

The name main will "point to" (contain the hash ID of) the last of these commits, commit D. Commit D has a snapshot—a copy of all the files, frozen for all time—and some metadata, and D's metadata says that the previous commit is commit C. We say that D points to C.

Commit C, of course, has a snapshot and metadata. The snapshot holds the files that Git knew about at the time you made C, frozen for all time, and the metadata holds the date-and-time and so on, including the hash ID of earlier commit B. We say that C points to B.

Commit B holds a snapshot and metadata too, and points backwards to commit A, which holds a snapshot and metadata. But commit A was the very first commit you made, in what had been, up until you made A, a totally-empty repository. So commit A doesn't point further backwards: it can't.

That's how your four commits are, in your repository. They can never change! They are completely read-only, and those four hash IDs are now used up forever.2 The name main points to the last one—until you make a new commit. Then new commit E springs into being, pointing backwards to D, and Git updates the name main to point to E:

A <-B <-C <-D <-E   <--main

2This is technically impossible, and Git doesn't really try to prevent anyone else from getting the same hash ID except by using cryptographic trickery to make it so unlikely that we don't have to worry about it. Nobody will accidentally re-use your hash IDs. The crypto makes it hard to do it on purpose, too.


Driving back into the past

But what happens when you want to visit an old commit? You ran:

git checkout d33795596001197f382038a72d20faf0cfbe7ab7

to tell Git to erase, from your work area, all the files that are safely stored forever in commit D, and go back to commit B: extract the stored-forever files from commit B into your work area. Git did that, and then git log showed you commits B and A and stopped. Why?

Git uses your HEAD to be able to see things

Git has a very special name, HEAD, that is not a branch name at all.3 Instead, this name HEAD is normally attached to a branch name. That's what your first git log shows:

commit 61c71a9e5a6d9e29a4172e687172dd4b8523eb4a (HEAD -> main)

Git has the name HEAD "pointing to" the name main here. I like to draw it this way instead:

A--B--C--D   <-- main (HEAD)

with the name HEAD "attached to" the name main. (I also got lazy about drawing the arrows between commits. Just remember that the connecting lines, from A to B to C to D, are really backwards-pointing arrows.)

Running git log tells Git: First, use HEAD to find a commit. Since HEAD is attached to main, Git uses main to find commit D. The git log command then shows you commit D—well, shows it by default; there are options you can give git log to change this—and then follows D's arrow back to C and shows C. Then git log follows C's arrow to B, and shows B, and follows B's arrow to A and shows A. Commit A has no backwards arrow, so git log can finally stop.

When you git checkout a commit by its hash ID, however, Git goes into what Git calls detached HEAD mode. Here, the name HEAD is no longer attached to a branch name. Instead, it points directly to a commit. If you choose commit B, you get this:

A--B   <-- HEAD
    \
     C--D   <-- main

The git log command works as before: it uses HEAD to find a commit. But this time HEAD finds commit B, not name main and then commit D. So git log shows B, and follows B's arrow back to A and shows A, and then runs out of commits to show and stops.

If you want to see all your commits, you can:

git checkout main

which switches back to branch main, re-attaching your HEAD:

A--B--C--D   <-- main (HEAD)

and now you're starting git log from the end of the road—the last commit on main—and you'll see all four commits. Or, you can run:

git log main

which tells git log that it should use the name main to look up the commit to start with. This will find commit D, even though HEAD is still pointing directly to commit B.


3It's technically possible to create a branch named HEAD. Don't do it.


More than one branch name

Once you understand the above, you're ready to handle multiple branch names. Suppose we have this:

A--B--C--D   <-- main (HEAD)

and we create a new name, such as develop, pointing to commit D, by running:

git branch develop

We now have this:

A--B--C--D   <-- develop, main (HEAD)

That is, both names, develop and main, point to commit D. The special name HEAD is currently attached to the name main though. Let's make a new commit on main, commit E, and draw it in:

           E   <-- main (HEAD)
          /
A--B--C--D   <-- develop

Commit E is now the latest commit on main, while commit D continues to be the latest commit on develop.

If you now run:

git checkout develop

or:

git switch develop

to switch to branch develop, we get:

           E   <-- main
          /
A--B--C--D   <-- develop (HEAD)

Commit E still exists, but Git will take all of E's files out of our work area, and put in all of D's files instead. The name HEAD is now attached to the name develop, not the name main, so git log will show commits D, C, B, and A and then stop. Running git log main will show E, then D, then C, and so on.

Note that commits up through D are on both branches. But now that we're on develop instead of main, let's make another new commit:

           E   <-- main
          /
A--B--C--D
          \
           F   <-- develop (HEAD)

Commits A through D are still on both branches, but now main and develop each have one commit that the other branch doesn't have. The two names pick the latest commits, which are E and F. E is the latest main-branch commit and F is the latest develop-branch commit. They're both "the latest commit"! If we make another new commit on develop, like this:

           E   <-- main
          /
A--B--C--D
          \
           F--G   <-- develop (HEAD)

then the two latest commits are now E and G. Each branch name "means" that particular commit, which is by definition the latest commit on that branch. Moreover, all the commits you (or Git) can find by starting at that "latest" commit, and working backwards, are "on" that branch. So when we have:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K--L   <-- br2

we have three latest commits, and commits up through H are on all three branches. Pick one name to check out, and that's the set of commits you'll see with git log; the files in your work area will be those from that latest—or tip—commit.

Note that the commits never change: once you make a commit, it is good forever. However, we find commits through branch names, and those do move about. If we take the last example and move the name br2 back one hop:

          I--J   <-- br1
         /
...--G--H   <-- main
         \
          K   <-- br2
           \
            L   ???

we may never be able to find commit L again. It has become "lost", as there's no way to recover its hash ID. As long as we can find J and K, though, we can't lose H, even if we completely delete the name main. Deleting that name just means we no longer have direct access to commit H: we have to find it by working back one step from K, or two from J.

  •  Tags:  
  • git
  • Related