Home > database >  Cannot remove remote branch which name contains diacritics
Cannot remove remote branch which name contains diacritics

Time:03-16

I made a mistake and named a branch with some diacritics (let's say temp-à-définir) and I pushed it to the remote.

Obviously, the next git fetch gave me :

error: * Ignoring funny ref 'refs/remotes/origin/temp-?-d?finir' locally

Then I renamed local branch without diacritic, with no problem. But I cannot remove the remote branch.

$ git push origin :temp-?-d?finir
fatal: invalid refspec ':temp-?-d?finir'

$ git push origin :temp-à-définir
error: unable to delete 'temp-à-définir': remote ref does not exist

So, I got some questions :

  1. Why git ls-remote --heads shows my funny branch, while git branch --remote does not ?

  2. How do I remove this remote branch? I tried all commands I saw on the multiple posts I found on SO about remote branches. I guess I have to replace the à and é by some funky codes, but I cannot figure which ones.

Any idea about how to achieve this ? Thx in advance.

CodePudding user response:

TL;DR

You'll probably need to have someone who has direct access to the server figure out what to delete, and delete it. That's probably a file in their .git/refs/heads/ directory holding their copy of the repository.

Long(ish)

A remote is another Git repository. As such, if you can log in on the remote, you can run Git commands there.1 This will allow you to find out how they have spelled the branch name in their repository.

The fundamental issue here is one that is only partially solved by Unicode: characters other than the 7-bit ASCII subset (also known as "US-ASCII") have no single standard representation everywhere. Hence the words "définir", as in your case, or "schön" in German, don't have one single common agreed-upon encoding that everyone uses everywhere.

Git, internally, tries a little bit to use UTF-8, but it relies on the OS and the native C library to get there, and if the UTF-8 support in the OS and/or the local libc is lacking, things may happen. Moreover, Git stores branch names in one or both of two places:

  • a "flat file" (i.e., crappy) database consisting of <hash-ID, name> pairs in .git/packed-refs; and/or
  • tree-structured directory-and-file areas provided by the OS, in .git/refs.

If we look at Unicode spellings of "définir", we find that there are two:

LATIN SMALL LETTER D (U 0064)
LATIN SMALL LETTER E WITH ACUTE (U 00E9)
LATIN SMALL LETTER F (U 0066)
LATIN SMALL LETTER I (U 0069)
LATIN SMALL LETTER N (U 006E)
LATIN SMALL LETTER I (U 0069)
LATIN SMALL LETTER R (U 0072)

and:

LATIN SMALL LETTER D (U 0064)
LATIN SMALL LETTER E (U 0065)
COMBINING ACUTE ACCENT (U 0301)
LATIN SMALL LETTER F (U 0066)
... the rest is the same ...

That is, the é character can occupy one Unicode point (e-with-acute), or two: e, followed by a combining acute.

Converting these two different representations (one is "NFC" or "composed" and the other is "NFD" or "decomposed") to UTF-8, we get, respectively:

64 (`d`), c3 a9 (`é`), 66 (`f`), 69 (`i`), 6e (`n`), 69 (`i`), 72 (`r`)

and:

64 (`d`), 65 (`e`), cc 81 (combining `´`), 66 (`f`), 69 (`i`), 6e (`n`), 69 (`i`), 72 (`r`)

So that gives us two ways the OS and C library could store .git/refs/heads/définir as a file name in the file system, assuming UTF-8 (if the OS uses some other encoding, as Windows generally does, we have other variables at play).

(Besides NFC and NFD there are also NFKC and NFKD. See, e.g., Normalizing Unicode. For Unicode tables, there are many sites; here's one.)

When you use git push to ask a server to create or delete a reference name, your Git software provides a C-style string holding a ref to the other side (that is, the "wire encoding", as it were, is mostly just a raw C string—the ref and its hash ID are space-separated, so the ref must be a valid Git ref and therefore won't contain a space). Nothing really says whether this is UTF-8, and if so, whether it's NFC or NFD or whatever.

The server then may wind up storing the ref in the file system, which might mangle it a bit. That probably is the case for your server.

When the server sends this back to your Git client, your Git client may re-interpret the possibly-mangled ref as something else. That seems to be the case here: in particular your server has replaced whatever encoded à and é with characters that are not legal in a refname (a server can do this over the wire without noticing that it has done so; the client will detect it and call it a "funny ref", as yours did).

Normally, you'd expect to be able to delete the "bad" ref with git push --delete and the ref as you spelled it on your side—but if the name-mangling happened after the server Git put it into the file system, the name-mangling occurs too late in the process: the server thinks it's asked to delete an unmangled name, which does not exist. You'd have to send the mangled name. However, your client Git won't let you request that, because that's an illegal ref name.

(Someone is, right now, working on some of the Unicode issues in Git, in preparation for a future Git release—2.36 or later. There are a lot of ugly and tricky cases here and I don't expect this to be fixed any time soon, but there may be some progress.)


1This assumes they're using the command-line Git system. There are alternative implementations of Git that speak Git protocol, but don't offer the command-line commands, in which case all bets are off.

  • Related