Home > OS >  Why do I need to add the `--remote` to git's submodule when I specify the branch in the .gitmod
Why do I need to add the `--remote` to git's submodule when I specify the branch in the .gitmod

Time:01-04

I want to pull/update the submodules at the right branch. Doing git submodule update pulls/updates the submodules but it changes to the wrong branch even when the branch I want to ALWAYS use is specified in the .gitsubmodule file.

Only when I do --remote does it work (but then I don't know what other unintended consequences it might have in the rest of my submodules).

I want to updat my modules exactly as specified in my .modules files. How do I do this?

e.g.

[submodule "pytorch-meta-dataset"]
    path = pytorch-meta-dataset
    url = [email protected]:brando90/pytorch-meta-dataset.git
    branch = hdb
[submodule "meta-dataset"]
    path = meta-dataset
    url = [email protected]:brando90/meta-dataset.git

Is what I should be running:

git submodule update
git submodule update --remote
git submodule init
git submodule status 

I did read the --remote:

--remote
           This option is only valid for the update command. Instead of using the superproject’s recorded SHA-1 to update the submodule, use the status of the submodule’s
           remote-tracking branch. The remote used is branch’s remote (branch.<name>.remote), defaulting to origin. The remote branch used defaults to the remote HEAD, but the branch
           name may be overridden by setting the submodule.<name>.branch option in either .gitmodules or .git/config (with .git/config taking precedence).

           This works for any of the supported update procedures (--checkout, --rebase, etc.). The only change is the source of the target SHA-1. For example, submodule update
           --remote --merge will merge upstream submodule changes into the submodules, while submodule update --merge will merge superproject gitlink changes into the submodules.

           In order to ensure a current tracking branch state, update --remote fetches the submodule’s remote repository before calculating the SHA-1. If you don’t want to fetch, you
           should use submodule update --remote --no-fetch.

           Use this option to integrate changes from the upstream subproject with your submodule’s current HEAD. Alternatively, you can run git pull from the submodule, which is
           equivalent except for the remote branch name: update --remote uses the default upstream repository and submodule.<name>.branch, while git pull uses the submodule’s
           branch.<name>.merge. Prefer submodule.<name>.branch if you want to distribute the default upstream branch with the superproject and branch.<name>.merge if you want a more
           native feel while working in the submodule itself.

My install script ends up looking retarded:

# -- gitsubmodules
# - set up pytorch-meta-dataset git submodule
cd ~/diversity-for-predictive-success-of-meta-learning/
# adds the submodule to the .gitmodules file & pull the project
git submodule add -f -b hdb --name pytorch-meta-dataset [email protected]:brando90/pytorch-meta-dataset.git pytorch-meta-dataset/
git submodule update --init --recursive --remote pytorch-meta-dataset

# - set up meta-dataset git submodule
# adds the submodule to the .gitmodules file & pull the project
git submodule add -f -b master --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/
# - git submodule update to fetch all the data from that project
git submodule update --init --recursive --remote meta-dataset

# - initialize your local configuration file
git submodule init
# - check the submodules
git submodule status

why do I need to specify the same thing so many times? What is the point of the .gitmodules file then at all? It can't even update things properly without screwing up the rest of the subdmodules

Look at the branchdes:

(meta_learning) brandomiranda~/diversity-for-predictive-success-of-meta-learning ❯ git submodule status                                         
 ca81edbf5093ec5ea1a1f5a4b31ec4078825f44b meta-dataset (arxiv_v1-200-gca81edb)
 6e60161962ae3fa309335da7aa1c675c75ecca54 pytorch-meta-dataset (heads/hdb)

they don't even match my .gitmodules

[submodule "pytorch-meta-dataset"]
    path = pytorch-meta-dataset
    url = [email protected]:brando90/pytorch-meta-dataset.git
    branch = hdb
[submodule "meta-dataset"]
    path = meta-dataset
    url = [email protected]:brando90/meta-dataset.git
    branch = master

related:


Extra: Why does git submodule status not match the output of git branch of my submodule?

Why does it still not work even if I specified the --remote?

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule add -f -b hdb --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/

Cloning into '/Users/brandomiranda/ultimate-utils/tutorials_for_myself/my_git/meta-dataset'...
remote: Enumerating objects: 2947, done.
remote: Counting objects: 100% (740/740), done.
remote: Compressing objects: 100% (65/65), done.
remote: Total 2947 (delta 689), reused 675 (delta 675), pack-reused 2207
Receiving objects: 100% (2947/2947), 3.17 MiB | 4.51 MiB/s, done.
Resolving deltas: 100% (2248/2248), done.
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule init

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule update --init

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule update --init --remote

(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule status
 ca81edbf5093ec5ea1a1f5a4b31ec4078825f44b meta-dataset (arxiv_v1-200-gca81edb)
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule update --init --recursive --remote meta-dataset
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ git submodule status                                         
 ca81edbf5093ec5ea1a1f5a4b31ec4078825f44b meta-dataset (arxiv_v1-200-gca81edb)
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git ❯ cd meta-dataset 
(meta_learning) brandomiranda~/ultimate-utils/tutorials_for_myself/my_git/meta-dataset ❯ git branch
* hdb

CodePudding user response:

It seems I can specify the branch in the .gitmodules file, but when I do git submodule update and variants (e.g. --all, --recursive, etc) it doesn't pull the Git submodule to the right branch.
This is obvious from the git submodule status.
How do I pull and make sure it's in the right branch?
Otherwise what is the point of specifying the branch then?

By default, a submodule does not switch to a branch it would pull. It only check out a SHA1: either the one registered in the index of its parent repository, or the one of a remote tracking branch).

  • either the one registered in the index: that is what a sumodule is: a remote repository URL and a gitlink, that is a SHA1 recorded as a special entry in the index)
  • or the one of a remote tracking branch, meaning the HEAD SHA1 of a remote repository branch, specified as submodule.<name>.branch in the .gitmodules

(That is the main source of "un-intuitivness")

With the --remote, it fetches from the remote, and set HEAD of the submodule to the fetched specified remote tracking branch.

To quickly set all your submodules to an actual branch:

git submodule foreach -q --recursive \
  'git switch \
  $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master)'

The $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master)' part:

  • execute a command in a subshell $(...) ($(command) is known as command substitution. It allows the output of a command to be used as an argument to another command. )
  • get the submodule.$name.branch of the current ($name) submodule, as visited by the git submodule foreach command
  • or return "master" if submodule.$name.branch is not set for that submodule: cmd1 || cmd2 executes cmd2 if cmd1 fails.
  • note: $toplevel gives path to super proj, so it gives the path to .gitmodule

Replace master by main, depending on your remote repositories default branch naming convention.

That won't scale if I have hundreds of submodules. I specified it in my .gitmodules file.

That is what git submodule foreach is for: scaling.


See also "Git: track branch in submodule but commit in other submodule (possibly nested)".
A script like the one below can reliably update/pull all the submodules where a branch is specified.

export top=$(pwd)
git submodule foreach --recursive \
  'b=$(git config -f ${top}/.gitmodules submodule.${path}.branch); \
   case "${b}" in \
     "") git switch ${sha1};; \
      *) git switch ${b}; git pull origin ${b};; \
   esac' 

Make sure to use the latest Git version (2.39 ): submodules issues have been fixed over time.


When is the --init needed for git submodules update?

I usually always uses --init with git submodules update simply because I do not have to think to the corner case where the submodule was not yet initialized.
If it was, --init does nothing anyway.

CodePudding user response:

My full tested end-to-end example with comments:

# https://stackoverflow.com/questions/74988223/why-do-i-need-to-add-the-remote-to-gits-submodule-when-i-specify-the-branch

# -- pretend you've add the submodules so far
git submodule add -f -b hdb --name meta-dataset [email protected]:brando90/meta-dataset.git meta-dataset/
git submodule add -f -b hdb --name pytorch-meta-dataset [email protected]:brando90/pytorch-meta-dataset.git pytorch-meta-dataset/

# - init local config & try to pull (from remote/branch or initializes your local configuration file and clones the submodules for you, using the commit specified in the main repository.)
#   ref: https://youtu.be/wTGIDDg0tK8?t=119, https://stackoverflow.com/questions/44366417/what-is-the-point-of-git-submodule-init
git submodule init
git submodule update --init
#git submodule update --init --recursive --remote

git submodule status

# - for each submodule pull from the right branch according to .gitmodule file
# ref: doc for "foreach" cmd: https://git-scm.com/docs/git-submodule/#Documentation/git-submodule.txt-foreach--recursiveltcommandgt
# ref: https://stackoverflow.com/questions/74988223/why-do-i-need-to-add-the-remote-to-gits-submodule-when-i-specify-the-branch#74994315
# note: The command has access to the variables $name, $sm_path, $displaypath, $sha1 and $toplevel...
# note: $toplevel is: $toplevel is the absolute path to the top-level of the immediate superproject.
# note: execute a command in a subshell $(...) ($(command) is known as command substitution. It allows the output of a command to be used as an argument to another command. )
# note: get the submodule.$name.branch of the current ($name) submodule, as visited by the git submodule foreach command.
git submodule foreach -q --recursive \
  'git switch \
  $(git config -f $toplevel/.gitmodules submodule.$name.branch || echo master || echo main )'

# - check status of one of the submodules for unit test above worked: https://stackoverflow.com/questions/74998463/why-does-git-submodule-status-not-match-the-output-of-git-branch-of-my-submodule
# note: in case response bellow says origin: "origin" typically refers to a remote repository that is associated with your local repository.
git submodule status
cd meta-dataset
git branch  # should show hdb
cd ..

credit to VonC!


one last lingering issue: Why does git submodule status not match the output of git branch of my submodule?

  • Related