I have this project added to a git repository with the main branch consisting of a number of commits.

      A--->B--->C--->D--->E

as i'm doing some experimental modifications on commits A through B i want to keep a copy of the commits of main branch inside a different branch main-copy in case anything went wrong due to play with commits in main branch and i wouldn't want to go through restoring, i still have an intact copy of main branch. if i create main_copy branch on top of commit E where Head is currently at i will not actually make a copy of main branch because commits A to B are common between the two branches. so i need to create a branch on top of commit A and do the same commits in this branch. the final repository would be something like this.

     A--->B--->C--->D--->E
     |
     `--->B'--->C'--->D'--->E'

I have no idea how to do this without going back to commit A making another branch and doing commits B through E again one by one. is there any devoted command to do this on git?

CodePudding user response：

TL;DR

All you need to do is create a branch name before you start experimenting.

Long-ish

... I'm doing some experimental modifications on commits A through ...

No, you're not.

I don't mean you aren't trying to do that. I mean it's literally impossible to do that, so no matter what you try, that doesn't actually happen.

The trick here is this: commits are immutable, but we don't find them by their immutable true names. The true name of any given commit is its hash ID, and using that hash ID will find that commit. That commit can never be changed, no matter how hard you try; using that name again later will find that commit (with one caveat: see below).

A branch name, in Git, is a database entry, but you can think of it as a small file, because it's sometimes implemented as a small file. This "file" (or actual file) contains one string, and that one string the entry provides is the hash ID of the last commit we want to claim is "in the branch".

That is, given a string of commits:

... <-F <-G <-H

which ends at commit H, if we create a branch name entry and stuff the raw hash ID of commit H into that name, we get:

...--F--G--H   <-- somebranch

Running git log somebranch shows us commit H, then commit G, then commit F, and so on.

If we attempt to change some of these commits—such as G—what happens instead is that we immediately get a new commit, a G', whose parent is F. New commit G' has no child, and to find G' we will use a name, such as a branch name, because humans are so bad at using hash IDs:

...--F--G--H   <-- somebranch
      \
       G'  <-- newbranch

If the purpose of making "modified" commit G' was to edit just the text of the commit message in G, new commit G' will hold as its snapshot the same snapshot that G holds (and the two snapshots will be 100% de-duplicated and thus the one for G' takes no extra space). We then have to make a new H' copy of H where the one thing we change in H' is its saved parent hash ID: commit H' must point backwards to our new G':

...--F--G--H   <-- somebranch
      \
       G'-H'  <-- newbranch

Once we're in this situation, if we stuff H's hash ID into the name somebranch, we get:

...--F--G--H   ???
      \
       G'-H'  <-- somebranch, newbranch

If we now delete the temporary name newbranch, it seems as though we've edited commit G' somehow:

       G--H   ???
      /
...--F--G'-H'  <-- somebranch

We literally can't find commit H unless we've memorized its hash ID somewhere.

Of course, the place to memorize its hash ID is obvious: we just create a new branch name before we do whatever it is that we do that "modifies" G to make G'. Then, once the name somebranch locates H' instead of H, our branch we created earlier locates H:

       G--H   <-- original-somebranch
      /
...--F--G'-H'  <-- somebranch

and git log original-somebranch shows us the original commits, which still exist.

What if we forget to create the name?

If we forget to create a name like original-somebranch, we must find the hash ID of old commit H. We have some time to do this: a Git repository will not discard an old commit, even if it's unfindable by ordinary means, for "some time". How long is that "some time"? Well, that depends on many things. Some hosting systems (e.g., GitHub) never discard old commits. Git by default sets three expiration times though: 14 days, 30 days, and 90 days. After one of these three times, old commits that can't be found by any name can go away.

You're safest, of course, if you make the new name first, before doing git rebase -i or whatever it is that you are doing. But most actions place a copy of a commit hash ID into another name, or a reflog entry, and here the 30 or 90 day default kicks in. To be safe we should just consider the shorter expiration: 30 days after we were able to find H, if we haven't bothered to create a name for it, Git may get around to removing it. Git's process for doing this is kind of lazy so it could stick around for many more months or years, but it has "become vulnerable" after the 30 day expiry of the reflog entry.

You can view reflog entries with git reflog, which is actually a front end that runs git log -g. See the documentation for git reflog for details. Note that in a sea of random-looking hash IDs and commit messages, it can be hard to tell one old commit from another. That's the other reason to make the name before you start working.

If a commit is removed via the "garbage collector" git gc, and you give Git its true name (hash ID), Git will say that it doesn't recognize that hash ID. The same thing happens if you just make up a hash ID—although if you "make up" a shortened one, there's a good chance that this is the valid prefix of some existing commit, so you want to make sure you remember the full hash ID. If you use a branch name (or tag name or other name) to hold the hash ID, that also stops git gc from removing it, since the commit is now find-able by some human-readable name. This also protects all commits "behind" (earlier than) this commit, since those too are find-able.