I'm attempting to push my output files from Databricks to Github. (From my understanding, git integration with Databricks is only for notebooks, and not for other files such as CSV. When you add a Databricks repo, a dialog appears saying only db-notebooks are cloned.)
I can successfully push to Github once, but after pushing I can no longer commit again. #commitmentissues
The error is that git cannot append .git/logs/HEAD:
fatal: cannot update the ref 'HEAD': unable to append to '.git/logs/HEAD': Operation not supported
What I've done
- Initialize git from Databricks notebook:
git init
- Tell git who I am:
git config user.email "<email>"
andgit config user.name "<name>"
- Add and commit file:
git add test.txt && git commit -m "message"
This works!
- Add remote:
git remote add origin https://github.com/<user>/<repo>.git
- Push to remote. Did this from RStudio in Databricks (rather than notebook) so that I could interactively add Github username and personal access token:
git push -u origin master
This works!
- Add a new file:
git add file2.txt
- Commit:
git commit -m "message"
This fails.
Error:
fatal: cannot update the ref 'HEAD': unable to append to '.git/logs/HEAD': Operation not supported
Why does pushing to Github change git's ability to append .git/logs/HEAD? How could I work around this?
Research
- This question is also about trying to push to Github from Databricks but it fails at a different step in the process, and is using Databricks Git Integration, which I am not.
- This Github issue returns the same error, but I got lost once they started talking about formats.
CodePudding user response:
The problem, in the end, is not to do with git but with mounted storage on Databricks.
In the process of a git commit, git appends log files. Databricks, however, prohibits appending of files on mounted storage.
The solution, then, is to host the repository on unmounted storage (e.g. in /tmp, as suggested by the previous link).
@torek, in the question's comments, points out that the working tree could remain on mounted storage, with only the repo being hosted on unmounted storage, using git init
's --separate-git-dir=
option.
/tmp/
project-repo
/dbfs/mnt/
project-working-tree