Home > Net >  How to go back in time and open the working repo but still save the current changes
How to go back in time and open the working repo but still save the current changes

Time:03-12

My current project has a bug and I can't seem to find it. A big mistake is that I have changed A LOT of stuff on my files and haven't committed any of it. I know that on my last commit the app was perfect and there were no bugs. So, my question is how do I go back in my history log and open the project at that time where I know there were no bugs without affecting the current changes I have already done? Can I checkout on a new branch and restore my project on the last commit? Thanks in advance.

CodePudding user response:

Since you have changed the files but not committed, you can stash the files first:

git stash

then you will get the perfect state of the app which was working (test it out). All changes will be stored. Don't worry.

Then create a new branch

git checkout -b new_branch_name

and pop that changes on new_branch_name:

git stash pop 

Voila, you get a new branch with bugs and changes, while your perfect app stays on the main or master branch. So now you have two branches,

  1. The perfect state App branch
  2. The perfect state App changes which caused the bug with branch name as new_branch_name.

CodePudding user response:

There's no need to use git stash across the new-branch-creation step (as in niko's answer). That is, instead of:

git stash
git checkout -b new_branch_name
git stash pop

(and then committing here on the new branch), you can just run:

git checkout -b new_branch_name

(and then commit here on the new branch). It won't hurt to use the git stash commands, in this particular case, but there's no need—and in general I recommend avoiding git stash as much as possible, as this particular Git command has had quite a few bugs, some of which can actually lose files, in various Git versions over the years.1

It's important to realize that this is a special case. To see why, read on, but if you don't really care, you can stop here.


1The problem here is that git stash does far too many different things, many of which were inadequately tested. This situation is gradually improving: there are now tests for the known cases that destroyed files irrecoverably, so that the old bugs will not recur. However, the rewrite of git stash from shell script (in Git 2.21) to C code (in Git 2.22) introduced a whole new crop of bugs at that time, and even now, in Git 2.35, there are a few known bugs (though they are far less serious).

Still, I think using git stash is a bit like driving on a cliff road that's killed 100 people in the last 20 years. "Oh, we fixed the deadly thing at mile-marker 22": OK, sure, but ... a road that's been that bad for that long probably still has other lurking dangers. As Ecclesiastes doesn't quite say, "The race is not always to the swift, nor the battle to the strong; but that's the way to bet."


Background (long)

A big mistake is that I have changed A LOT of stuff on my files and haven't committed any of it.

Commits (and the creation of branch names) are cheap. Use them. You can commit stuff that doesn't work, just to save the state, before you try something different, too. If you're working on experimental branches, you can leave those broken commits in the experimental branch, and switch back to a working version and then start a new experimental branch. As long as you never git push your experimental branches to some other repository, nobody will even know about your experiments, and you can discard them later.

The issue here in Git is easy to put in one sentence, which sounds self-contradictory, but is true: The files you work on and with, when working in Git, are not in Git.

What this means is simple enough: Git is, in the end, all about commits. It's not about files, although files are kept inside commits, and it's not about branches either, although branch names help you (and Git) find commits. Git is about the commits.

Each commit:

  • Is numbered. The "number" of a Git commit is a universally unique ID (a UUID, or a GUID if you prefer the term "globally unique"). Once you make a commit, it has an assigned ID, and that ID can no longer be used in any Git repository anywhere unless it is for that commit.2 That's why commit hash IDs are so big and ugly and random-looking, and impossible for humans to use directly. If they weren't so big and random-looking they'd accidentally get re-used.

  • Is completely read-only. Once you make a commit, nothing—not even Git itself—can change it. This is because the hash ID is a cryptographic checksum of the contents. Should you take a commit out of the repository and make some changes and try to put it back in, what you get is a new commit with a new and different hash ID. The old commit is still there, unchanged.

  • Stores two things:

    • A commit stores a full snapshot of every source file. These files are, like the commits, completely unchangeable. To keep the repository from growing hugely fat, the files are not only compressed—sometimes highly compressed—but also de-duplicated. Since the files are read-only, it's entirely safe to share some version of README.md's content with as many commits as would like to use it. That version of that content literally can't change, so it's safe to re-use it.

      This means that although each commit "stores" every file, a new commit only adds a file to the repository if that file's content is different from every other already-stored file. There's a little bit of space used for the commit itself, but a new commit that exactly matches an old one—these can occur if, e.g., you use git revert to back out some commit—literally takes no space to store its files.

    • Meanwhile, each commit stores some metadata: information about this particular commit. That information includes the name of the person who makes the commit (your user.name setting, if you just made the commit) and their email address (your user.email). It includes the current date-and-time as read from the computer's clock. It includes any log message you'd like to put in, describing why you made this commit. And—crucially for Git's own operation—it includes a list of hash IDs. This list usually has just one hash ID in it: that's the parent of the commit.

It's this metadata that makes Git work. If we substitute in single uppercase letters for real hash IDs, and imagine a small, three-commit repository, it might look like this:

A <-B <-C

Commit C is our third (and current) commit, and it contains a full snapshot of every file. However, some files might be identical to those snapshotted in commit B, in which case they're literally shared with commit B. But importantly, commit C says, in its metadata, I have one parent and its hash ID is <insert real hash ID of B here>. So if we get Git to look at commit C (by giving Git its hash ID), Git can use that to find B.

That's how git log works: it starts with the current commit, whatever that is, and shows its metadata (author, log message, etc) as appropriate. Then, using the metadata, it moves on—or back, depending on how you look at it—to commit B. Of course, commit B is a commit, with snapshot and metadata, so git log can show B's metadata and move on (or back) to commit A. That's a commit as well, so git log can show it, and move ... whoops! The list of parent hash IDs for commit A is empty. There's nowhere left to go, so now git log stops.

Should we want to see what changed in C, Git can retrieve B's hash ID from C's metadata and look at both snapshots. The de-duplication trick makes it easy for Git to eliminate all the identical files, and then Git just has to play a game of Spot the Difference with the differing files. The result is a git diff, showing the changed lines within the changed files. The same trick works for commit B, but for commit A, Git has to use a pretend commit with the empty tree as the "parent" commit. This makes the diff show all files as newly added.

There's one big hitch here. To make this work as describe so far, Git would have to make us (humans) memorize the latest commit hash ID. Humans are no good at that. This is where branch names enter the picture.


2This is mathematically impossible due to the pigeonhole principle, which means Git is eventually doomed to fail. The length of time it takes to reach accidental failure is determined by the size and effectiveness of the hash, and without deliberate attacks, the hash that Git uses so far is sufficient to make the chance of failure so low that we get to ignore it. Moreover, if it did fail, the result is not "universe explodes" or anything quite so drastic: it's just "can't add new commit". Still, Git is moving to an even bigger hash because it's now possible to construct malicious hash collisions using only a few hundred years of computation. As computers get faster, and computational clusters get larger, it may become possible to construct such collisions in mere hours, but moving from 160 bits to 256 bits pushes the problem back into the "forever" zone. (For technical reasons, adding 96 bits only raises the problem by a factor of 248, but that's 281,474,976,710,656, or almost 300 quadrillion.)


Branch names help us find commits

Let's take our three-commit repository and add, to the drawing, a branch name main:

A--B--C   <-- main

I have deliberately gotten lazy about drawing the backwards-pointing arrows between commits. They are part of each commit's metadata, and therefore cannot be changed once made; they always point backwards since the hash IDs look random and depend on, among other things, the exact second at which you make each commit, so unless you know in advance when you will make the commit, it's obviously impossible to know what its hash ID would be.3 But the name main, well, we will see how it changes over time. Right now that name points to—i.e., holds the hash ID of—the latest commit in the repository, commit C. So we do not have to memorize C's hash ID: we just say main and Git looks it up for us!

Now, suppose you're on your one branch named main, and you make a new commit, in the usual way, with modifying files and git add and git commit. When you run git commit, Git will make a new commit, with a new, unique hash ID, but we'll just call the new commit D. Git will gather all the metadata it needs from your settings, your computer's clock, and a log message you must provide. It will add, to this metadata, the hash ID of existing commit C as the (single) parent for new commit D, so D will point back to C:

A--B--C
       \
        D

It will save a snapshot of all files as well (actually from Git's index, aka the staging area, but we won't cover that here), so D captures the files and the metadata. And then git commit does its special trick: since you're "on" branch main—as in, git status says on branch main—Git stores whatever D's hash ID is in the name main, so that main points not to C any more, but rather to D:

A--B--C
       \
        D   <-- main

and there's no reason to draw D on a separate line, so we can just write:

A--B--C--D   <-- main

3Even if you did know everything else, you'd have to know in advance the hash ID of the commit you're about to make right now, before you make it, so that you can put that hash ID into the hashing function that will make this new commit's child commit. So you end up having to make the parent commit first anyway. There's no practical way to make the parent "point forward" to the child, but it's trivially easy to—later—make the child "point backwards" to the parent.


Having more than one branch name

Now that we have:

A--B--C--D   <-- main

let's make a new branch name, such as experiment-1. A branch name, in Git, must point to some existing commit. Just as main now points to commit D, experiment-1 must point to some commit. Which one should we use? We can pick any of the four, but why not use the same one we're using now? If we don't say otherwise, that's exactly what Git will do. We end up with:

A--B--C--D   <-- experiment-1, main

Now that we have two names, we need a way to know which one we're "on": git status will say on branch ...—but what fills in the ... part? The trick Git uses here is to attach the special name HEAD to just one of these branch names, like this:

A--B--C--D   <-- experiment-1, main (HEAD)

This means we're on commit D, using the name main to find it. If we run git checkout experiment-1 or git switch experiment-1, we get:

A--B--C--D   <-- experiment-1 (HEAD), main

We're still on commit D, but now we're using the name experiment-1 to find it.

Let's suppose we make a new commit now. Just as before, we'll get:

A--B--C--D   <-- main
          \
           E   <-- experiment-1 (HEAD)

The last step of git commit was to write into the current branch name the hash ID for the new commit, which in this case is now commit E. So main still points to commit D, and experiment-1 now points to new commit E.

If we now use git checkout or git switch to switch back to commit D via branch name main, Git has to remove all the files that go with commit E. It needs to get us back to the state we have for commit D, by filling in the working files from commit D*. That's safe to do because we just made commit E, so everything is safely saved away.

We can switch back and forth as much as we like, and Git will swap out the working files—the ones we see and work on / with—using the committed files, which are saved forever in the read-only commits.

We can now create another new branch name, experiment-2 for instance. I'll draw experiment-1 on the top line and main in the middle for now:

           E   <-- experiment-1
          /
A--B--C--D   <-- main (HEAD), experiment-2

If we switch to experiment-2 now, Git has to swap out the commit-D files for ... the commit-D files. But that's pointless! Commit D's files obviously all match commit D's files. Git doesn't have to do anything, and it won't. And that's the secret that lets us skip git stash entirely: if we want to create and switch to a new branch right now, we can have all kinds of unsaved work that we have not committed. Creating a new branch, pointing to the current commit, and switching to it, just means "make a new name pointing to the same commit as the current name" and then "attach HEAD to the new name". It doesn't matter whether you do this in two steps:

git branch new
git checkout new      # or git switch new

or in one:

git checkout -b new   # or git switch -c new

In both cases, Git just makes the new name and switches to it, without touching anything you're working on.

What if you want to start from somewhere else?

Suppose you've been working for a while and have this:

          I--J   <-- exp-1 (HEAD)
         /
...--G--H   <-- main

You'd like to try again from commit H on a new experiment. If you have unsaved work, what you need to do is save it now: add and commit to get:

          I--J--K   <-- exp-1 (HEAD)
         /
...--G--H   <-- main

Give yourself a reasonable description for this work-in-progress commit, for when you come back to it later (if ever). Then you can use:

git checkout -b exp-2 main

or:

git switch -c exp-2 main

to create a new branch exp-2 pointing to commit H and switch to that new branch:

          I--J--K   <-- exp-1
         /
...--G--H   <-- exp-2 (HEAD), main

You're now on commit H, not commit K, and you can now start a whole new series of experimental commits. After the first such you will have:

          I--J--K   <-- exp-1
         /
...--G--H   <-- main
         \
          L   <-- exp-2 (HEAD)

and your exp-2 branch will grow from there.

Conclusion

What's in Git are the commits—which you can eventually send to some other Git repository using git push—plus some names, such as branch names, to let you find the commits. The files you work on / with are not in Git. They get copied out of commits, and once you're finished editing them and git add them, they get copied back into Git to go into the next commit. But the files you see and work on / with aren't in Git at all!

The idea behind git stash is to make some commits—these are the only things Git has for storing files—but to make those commits on no branch at all. That's where Git saves the work you have been doing. The concept itself is relatively straightforward (make secret commits to save the work, then remove the work from the working tree by doing a git reset --hard). It's just that because Git doesn't actually make commits from the working tree, and the git stash command has a lot of special features, the actual stash code has historically been a source of bugs. With the working tree files not being in Git until added-and-committed, these bugs have historically had some bad cases of destroying working files (the files didn't get into the special on-no-branch commits, but then got erased).

With stashes being hard to see and hard to reason about, I find it better to just make ordinary commits: you can use git rebase -i later to get rid of any temporary commits. If I can avoid git stash entirely, I find that best, and for this particular case—starting a new branch at the current commit—it's really easy to avoid git stash.

  • Related