Home > Blockchain >  How are WSL POSIX paths converted to UNC for Windows native applications?
How are WSL POSIX paths converted to UNC for Windows native applications?

Time:04-15

I found out that if I execute a Windows native program (PE) from WSL2, accessing a POSIX path magically works.

For example, I can access /dev/random if I execute my program from WSL bash, but if I execute the same program from CMD (command-prompt), I cannot.

I must understand the mechanism which allows this! :)

The test program is fairly simple:

#include <stdio.h>
int main(int argc, char *argv[], char *envp[]) {
    printf("%p\n", fopen("/dev/urandom", "r"));
    return 0;
}

If I execute this from inside the WSL instance, it succeeds opening the device.

If I execute this via CMD, however, it fails.

When I look at API mon, I can see that the open("/dev/urandom", "r") is converted to CreateFileA("\\wsl.localhost\Ubuntu\dev\urandom", ...).

First question: What component is doing this conversion?

If I replace the fopen with CreateFile it fails... so it must be something in the stdio functions.

Second question: How does it know what WSL instance is the parent?

I saw no API query, no environment to give me a hint. The only abnormality I can see is the opening \\wsl.localhost\Ubuntu\tmp during process startup.

Third question: Does this survive nested within process tree?

When I execute cmd.exe from inside WSL, then execute my test program, it fails.

However, I wrote my own native Windows program that executes my test program and the test program succeeds, so this behavior does survive process tree.

Can anyone explain the mechanism that allows this magic to work? What API? What component is doing the transition? Where is the context stored? How is it queried? How does it knows what distro to lookup?

I tried to ask this at Microsoft discussion[1] and got no response, so I am hopping someone here may be able to provide a hint.

[1] https://github.com/microsoft/WSL/discussions/8212

CodePudding user response:

Short summary. I believe:

  • /init handles the conversion of the working directory that gets passed to the Windows executable.
  • When a path starts with a directory separator character (e.g. / or \), fopen considers it to be relative to the root of the volume of the working directory.

For example:

  • If you execute your code from /home/<username>
  • ... then the working directory will be \\wsl.localhost\Ubuntu\home\<username>.
  • ... the "volume" (share name in this case) will be \\wsl.localhost\Ubuntu\
  • ... so /dev/random is opened as \\wsl.localhost\Ubuntu\dev\random.

Try this, however:

  • cd /mnt/c (or any location inside that mount)
  • Call your program via /full/path/to/the.exe.
  • The fopen fails in my testing (and I assume will for you as well), because ...
  • ... the working directory that gets passed in is C:\ (or a subdirectory thereof).
  • ... thus the volume name is also C:\.
  • ... and fopen attempts to open C:\dev\random, which doesn't exist.

More detail:

What component is doing this conversion?

That part is (I believe) fairly easy to answer, although not definitively. As mentioned in this answer, when you launch a Windows executable in WSL, it uses a handler registered with binfmt_misc (see cat /proc/sys/fs/binfmt_misc/WSLInterop) to call the WSL /init.

Unfortunately, WSL's /init is closed source, and so it is difficult to get full insight into what is happening with the launch process. But I think we can safely say that the handler (/init) is going to be the component that converts the path before the Windows process receives it.

One interesting thing to note is that the wslpath command is mapped to that same binary via symlink. When called with the name wslpath, the /init binary will do OS path conversions. For example:

wslpath -w /dev/random
# \\wsl.localhost\Ubuntu\dev\random
But here's the real question ...

So we know that /init knows how to convert the path, but exactly what does it convert when launching a Windows binary? That's a bit tricky, but I think we can surmise that what gets converted is the path of the current working directory.

Try these simple experiments:

$ cd /home
$ wslpath -w .
\\wsl.localhost\Ubuntu\home
$ powershell.exe -c "Get-Location"

Path
----
Microsoft.PowerShell.Core\FileSystem::\\wsl.localhost\Ubuntu\home

$ cd /dev
$ wslpath -w .
\\wsl.localhost\Ubuntu\dev
$ powershell.exe -c "Get-Location"

Path
----
Microsoft.PowerShell.Core\FileSystem::\\wsl.localhost\Ubuntu\dev

$ cd /mnt/c
$ wslpath -w .
C:\
$ powershell.exe -c "Get-Location"

Path
----
C:\

And another question

So here's my question -- When did the Windows API get smart about concatenating UNC working directories and paths that start with a directory separator? I can find no documentation on that behavior, but it obviously works. And it's not specific to WSL. I observed the same concatenation behavior when using a UNC working directory for a regular network share.

Even more curious is that .NET's path handling is not this smart about UNC concatenation. From the doc, the behavior we observe with fopen is expected for DOS paths, but for UNC:

UNC paths must always be fully qualified. They can include relative directory segments (. and ..), but these must be part of a fully qualified path. You can use relative paths only by mapping a UNC path to a drive letter.

And I was able to confirm that behavior in PowerShell with a simple Get-Content.

Back to our regularly scheduled ...

But that aside, you don't even need your sample code to demonstrate this. You can see the same behavior by calling notepad.exe from within WSL:

$ cd /etc
$ notepad.exe /home/<username>/testfile.txt
# Creates or opens the proper file using \\wsl.localhost\Ubuntu\home\<username>\testfile.txt

$ cd /mnt/c/Users
$ notepad.exe /home/<username>/testfile.txt
# Results in "The system cannot find the path specified", because it is really attempting to open C:\home\<username>/testfile.txt, and the `home` directory (likely) doesn't exist at that path.
And your other related questions:

How does it know what WSL instance is the parent?

In case it's not clear by now, I think it's safe to say that the WSL /init knows what WSL instance you are in since it is "orchestrating" the whole thing anyway.

Does this survive nested within process tree?

As long as one process doesn't change the working directory of the next process in the tree, yes. However, CMD doesn't understand UNC paths, so, if it's in the process chain, your program will fail.

  • Related