Portable way to implement variadic arguments in kernel space?-CodePudding

I am wondering if it is possible to implement the variadic macros in C or assembly.

I would prefer to have at least va_start() be a C macro but looks like this might not be possible. I have seen other answers to different questions saying it is not possible to do in C because you have to rely on undefined behaviour.

For context I am writing a kernel and I do not want to rely on any specific C89 compiler or unix-like assembler. Building the source with any C compiler is important for the project. Keeping it simple is another goal, unfortunately supporting something like variadic arguments seems to be complex on some architectures (amd64 ABI).

I know the __builtin_va_start(v,l), __builtin_va_arg(v, l), etc. macros exist but these are only available to specific compilers?

Right now I have the kernel printf(, ...) and panic(, ...) routines written in assembly (i386 ABI) which setup the va_list (pointer to first va argument on the stack) and pass it to vprintf(, va_list) which then uses the va_arg() macro (written in C). This does not rely on any undefined or implementation defined behaviour but I would prefer that all the macros are written in C.

CodePudding user response：

Summary: Just #include <stdarg.h> and use va_start and friends as you normally would. A standard-conformant C compiler will support this, even without what we normally think of as a "C library", and it is perfectly usable in a kernel that must run on the bare metal without OS support. This is also the most portable solution, and avoids needing an architecture-, compiler- or ABI-dependent solution.

Of course when writing a kernel, you are used to not using library facilities like the functions from <stdio.h>, <stdlib.h>, and even <string.h> (printf, malloc, strcpy, etc), or having to write your own. But <stdarg.h> is in a different category. Its functionality can be provided by the compiler without OS support or extensive library code, and is in some sense more a part of the compiler/language than the "library".

From the point of view of the C standard, there are two kinds of conforming implementations (see C17 section 4, "Conformance"). Application programmers mostly think about conforming hosted implementations, which must provide printf and all that. But for a kernel or embedded code or anything else that runs on the bare metal, what you want is a conforming freestanding implementation (I'll write CFI for short). This is, informally speaking, "just the compiler" without "the standard library". But there are a few standard headers whose contents a CFI must still support, and <stdarg.h> is one of them. The others are things like <limits.h>, <stddef.h> and <stdint.h> that are mainly constants, macros and typedefs.

(This same distinction has existed all the way back to C89, with the same guarantee of <stdarg.h> being available.)

If your kernel will build with any CFI, that's pretty much the gold standard of portability for a kernel. In fact, you'll be pretty hard-pressed not to use some more compiler-specific feature at some points (inline assembly is awfully useful, for instance). But <stdarg.h> doesn't have to be one of them; you're really not giving up any portability by using it. You can expect it to be supported by any usable compiler targeting any given architecture, and that includes cross compilers (which will be configured to use the correct header for the target). For instance, in the case of a GNU system, <stdarg.h> ships with the gcc compiler itself, and not with the glibc standard library.

As some further assurance, until very recently, the Linux kernel itself used <stdarg.h> in precisely this way. (About a month ago there was a commit to create their own <linux/stdarg.h> file, which just copy-pastes from an old version of gcc's <stdarg.h> and defines the macros as their gcc-specific __builtin versions. Linux only supports building with gcc anyway, so this doesn't hurt them. But my best guess is that this was done for licensing reasons - the commit message emphasizes that they copied a GPL 2 version - rather than based on anything technical.)

By contrast, writing your variadic functions in assembly will naturally tie you to that specific architecture, and they'd be one more thing to be rewritten if you ever want to port to another architecture. And trying to access variadic arguments on the stack from C, with tricks like arg = *((int *)&fixed_arg 1), is (a) ABI-dependent, (b) only possible at all for ABIs which actually pass args on the stack, which these days isn't much besides x86-32, and (c) is undefined behavior that might be "miscompiled" by some compilers. Finally, things like __builtin_va_start are strictly compiler-dependent (gcc and clang in this case), and using <stdarg.h> is no worse because gcc's <stdarg.h> simply contains macros like #define va_start __builtin_va_start.

CodePudding user response：

Since you are highlighting kernel space, is it right that you want a user space function which is implemented via some sort of kernel call and is variadic? This is a bit problematic; as a typical kernel entry point transitions the flow of control onto a kernel stack. Your va_start(), va_arg() implementations would have to be aware of how to traverse to the user's stack, and possibly map bits of a register save area into the vector.

An easier approach would be to have the user function:

int ufunc(char *fmt, ...) {
    va_list v;
    int n;
    va_start(v, fmt);
    n = __ufunc(fmt, v);
    va_end(v);
    return n;
}

And implement __ufunc in the kernel. Traditionally this is how the execl and execv family of functions co-operate to make handy interfaces, but only use one kernel call.

Your kernel will still have a bit of work dealing with the user stack though. For example, I could craft a va_list value for your call that caused the kernel to read out some private data. But if you are able to sort that the va_list points somewhere valid, and whatever processing you are doing with va_arg() supplied values are also valid, you would be able to use the stock compiler provided implementation.

Do note that if the user program used a different calling convention than your kernel, you could be in for a bit of work. For example, microsoft ignored the published ABI for the amd64, so that might cause a problem.