Home > database >  How to use Git commands to add multiple separate folders to GitHub such that each folder can be upda
How to use Git commands to add multiple separate folders to GitHub such that each folder can be upda

Time:06-15

so I am new to this whole GitHub and Git thing. I recently learned the basics of Git (adding, pushing, pulling, cloning, etc). My intro to Java professor asked me to make a git hub repository for all my class homework. She told me to organize it in such a way that there are separate folders for each homework and each homework folder contains multiple source files.

So I set up my files like this: Java(main folder) -> Hw1 Hw2 Hw3 etc. How would I do this using git? All of these folders should be on my local and git hub repositories and I should be able to make changes to them separately.

Thank You in advance. I am stuck.

CodePudding user response:

Let's start with some basics. You already understand that your computer uses a tree-structured file system: that is, a directory (or folder—the terms are now interchangeable) holds files and/or more directories/folders, which in turn hold more files and/or folders, etc. Windows natively uses a backwards slash \ to separate the various components, so that you might have:

java\hw1\main.java
java\hw1\sub.java
java\hw2\main.java

and so on. Windows can use forward slashes (some commands may use them for other purposes, but they do work in file names), and all non-Windows OSes tend to use forward slashes, which are easier to type. Git also uses forward slashes so that's what I'll do here.

(Aside: Windows and macOS by default use "case insensitive but case preserving" rules, so that if you create a file named readme.txt, you can later open it using the name ReadMe.txt or README.TXT but it remains named in all-lowercase. Git, by contrast, is usually case-sensitive and thinks that readme.txt, ReadMe.txt, and README.TXT are three different file names. This causes endless grief on such systems1 and sometimes the best, or at least easiest, way to avoid all problems here is to completely avoid uppercase letters everywhere. To the extent that you can use java instead of Java, hw1 instead of Hw1, and so on, I would encourage you to do so.)

When you ask Git to create a new, empty repository using git init,2 Git creates a hidden folder named .git. This hidden folder will contain all of Git's files: here, Git will store its main two databases. We'll talk about those in just a moment. The place where Git creates .git is whatever your current working directory is, so if you are in java/hw1 and run git init, Git creates java/hw1/.git. If you are in java and run git init, Git creates java/.git.

Note that java/.git and java/hw1/.git are different folder path names, and therefore you can create two repositories. You do not want to do this, but that's what you did. (I base this claim on this comment.) We'll come back to "how to fix this" soon.


1In particular, someone using Linux can literally create three different files that differ only in case, stuff all three into a commit in a Git repository, and leave you with a problem when you go to check out this commit on a Windows system. If you're used to the system mapping from typed-in-lowercase to the matching case, and you ask an editor to create java/hw1/thing.java on Linux, it might actually create a java and hw1 right next to your existing Java and Hw1. Since those are different directories they can store different files with the exact same names as those in Java/Hw1/, including name-case. Git will happily store all these files, and Windows often cannot extract such a commit properly.

2Note that git init will first check to see if you're already in some existing repository. In this case, rather than creating a new repository, Git will "reinitialize" the existing repository. In most cases "reinitializing" like this has no effect at all.


The main thing to know about Git and a Git repository

A Git repository—or what I sometimes call the repository proper—consists mainly of two databases. One is usually much bigger. It contains commits and other supporting Git objects. These objects all have hash IDs (or more formally, object IDs or OIDs) that Git must have in order to retrieve the objects from the database. This could force humans to memorize Git commit hash IDs, but that's a bad plan: hash IDs are very large, very random-looking, and impossible for humans to remember in general.

For this reason, a Git repository contains that second, usually much smaller, database. In this database, Git stores names: branch names, tag names, remote-tracking names, and many other kinds of names. These names are for you (and other humans) to use. Each name stores one hash ID, but that's enough to make everything work. So you'll use a branch name, like main or master. This name holds the hash ID of the latest commit, which allows Git to retrieve that commit.

Each commit stores two things:

  • A commit stores a full snapshot of every file (that Git knew about, that is) at the time you, or whoever, made that commit. The files inside the commit are stored in a special, read-only, Git-only, compressed and de-duplicated form, that only Git can read, and literally nothing can write. (This uses some of those "supporting objects" I mentioned; the files are actually stored in the objects database as "Git objects".) Because nothing but Git can use these files, the files in a commit are useless on their own. We'll see in just a moment how we work with these files.

  • Meanwhile, that same commit that's storing a snapshot, also stores some metadata, or information about the commit itself: who made it (you, probably), and when, for instance. To make "branches"—a poorly-defined word in Git (see What exactly do we mean by "branch"?)—work, the commit's metadata contains the hash ID of the previous commit.

This "contains previous commit's hash ID" is how Git stores history: the branch name, e.g., main, lets Git find the last commit you made, and then by reading that commit, Git can find the hash ID of the second-to-last commit. For instance, suppose the hash ID of the last commit is H (it's actually some big ugly hexadecimal number so we're just using H to stand in for it). Then we say that the name main points to commit H. But commit H contains the hash ID of an earlier, or parent, commit: let's call that one G. We say that H points to G, and we can draw that:

        <-G <-H   <--main

Since G is a commit, it has one of these points-to pointers sticking out of it, too. By reading commit G's metadata, Git can find the raw hash ID of its parent; let's call that commit F:

... <-F <-G <-H   <--main

So main points to H, which points to G, which points to F, which points to ... well, this goes on until we get back to the very first commit ever—commit A perhaps—which, being first, can't point backwards and therefore simply doesn't.

What this means is that instead of one hash ID, each commit stores, in its metadata, a list of previous-commit hash IDs. The list can be empty, and is for that first commit. It can also have more than one hash ID, but we won't cover this case here. Most commits in most repositories are "ordinary" commits and have exactly one parent, though.

Your "working tree"

A repository, then, stores names—branch names for instance—that help Git find commits for us (we only have to remember the branch names), and stores commits that then store files. But the stuff in the commits (along with the actual commits themselves) is all completely read-only. Git must do this to make the hashing scheme work. What good are stored files if we can't write on them? Moreover, only Git can read them, so what good are they if we can't even read them?

This is where your working tree comes in. Most Git repositories have a working tree.3 The working tree of a repository is, quite simply, where you do your work. And, as we saw earlier, if you use git init in some directory to create a new, totally-empty repository and then make an initial commit:4

mkdir new
cd new
echo example > README.txt
git init
git add README.txt
git commit

you will wind up with a hidden .git folder here in the new/ folder we just made (mkdir new) and entered (cd new). The working tree for this Git repository in new/.git is new/, and the file we created—README.txt—in that working directory is now also stored in the first (and so far only) commit in that repository.

If we now modify the one file, and/or add a new file, and use git add and git commit appropriately, we'll get a second commit that stores (forever5) the new versions of that file. That second commit has, as its parent commit, the first commit, which stores (forever) the earlier version with just the one file in it.

The second commit is now our current commit, and is now the last commit on the main or master branch (whatever its name is).

Git allows us to check out any commit we have stored in the repository. When we do that, Git will erase from our working tree the files that go with the current commit. It will, instead, install into our working tree the files that go with the newly selected commit—which then becomes the current commit.

In this way, we can "go back in time", any time we like, to any older version, stored as a commit in the big database. All we have to do is find its commit hash ID (for which git log comes in handy, for instance). That's not what we'll focus on right now though.


3The exception here is a so-called bare repository. We won't cover these here.

4These are Unix-shell-style commands as I don't use Windows myself, but this should work in git-bash, which is just a port of to Windows for use with Git. You can do all this in PowerShell or even CMD.EXE instead, but some command details might change.

5Well, forever, or as long as the commit itself continues to exist. If we remove the commit, we remove its snapshot. This is actually kind of hard to do! However, if we remove the repository proper, we destroy the two databases, which removes all commits, and this is pretty easy to do.


"Nested" repositories: the thing you didn't want, but made

Given that the computer—the host operating system, which is in your case Windows, but this is also true of macOS and Linux—demands and uses a tree-structured file system, we can set up a structure like this:

java
  .git
    <various Git repository control files and databases>
  hw1
    .git
      <various Git repository control files and databases>
    main.java
  hw2
    .git
      <various Git repository control files and databases>
    main.java

and so on. Here we have one repository per hw directory plus one overall containing repository in the java directory.

But here's the problem: Git literally cannot store a Git repository inside a Git commit.6 Instead of doing so, the "outer" repository—in this case the one in java/.git, whose working tree is the java/* files—will store what Git calls a submodule using what Git calls a gitlink. To store a submodule correctly, you must use git submodule add, not git add; git add creates or updates only the gitlink, which is sort of half a submodule.

If someone does want submodules (but you don't), this git submodule add method is how to make them. The result is that when you clone the java repository, you get files, plus the magic gitlinks, that Git will need in order to run additional git clone commands, one for each submodule. This way, the person who clones the java repository can run git submodule update --init to run a bunch more git clone commands. But again, that's not what you want.


6There are some tricks to get around this problem if you really need to do it, but it's not a good idea in general. The recent safe.directory stuff is an outgrowth of a security issue that resulted in a CVE when someone discovered such a trick. The tricks that Git allows involve renaming the .git directory; the ones it doesn't allow, or accidentally allowed in the past, result in CVEs.

  • Related