I have a re-entrant C function whose wasm
output is not "thread-safe" when using imported shared memory, because the function makes use of an aliased stack that lives on the shared linear memory on a hardcoded position.
I'm aware that multithreading is not fully supported yet, and if I want to use multiple instances of the same module concurrently, avoiding crashing and data races it's my responsibility, but I accept the challenge.
My X problem is: My code is not thread-safe, and I need it to be by having non-overlapping stacks.
My Y problem is: I'm trying to modify the __stack_pointer
so I can implement the stack separation, but it doesn't compile. I have tried with extern unsigned char __stack_pointer;
but it throws me the following error:
wasm-ld: error: symbol type mismatch: __stack_pointer
>>> defined as WASM_SYMBOL_TYPE_GLOBAL in <internal>
>>> defined as WASM_SYMBOL_TYPE_DATA in senseless.o
Since maybe I'm not supposed to touch that pointer directly, I'm thinking on other solutions as well (see below).
Working example:
#define WASM_EXPORT __attribute__((visibility("default"))) extern "C"
#define WASM_IMPORT extern "C"
struct senseless
{
unsigned c[1024];
unsigned __attribute__((noinline)) compute(unsigned seed) { return c[seed % 1024]; }
};
WASM_EXPORT unsigned compute_senseless(unsigned seed)
{
senseless h;
h.c[5] = seed;
return h.compute(seed);
}
After compilation, the resultant implementation in WAST is:
(module
(type (;0;) (func (param i32) (result i32)))
(import "env" "memory" (memory (;0;) 2 2 shared))
(func (;0;) (type 0) (param i32) (result i32)
(local i32) ; int l; // address of h
(global.set 0 ; SP = l (4096 = sizeof(senseless))
(local.tee 1 ; where l = SP - 4096
(i32.sub (global.get 0)
(i32.const 4096))))
(i32.store offset=20 ; linear_memory[l 5 * 4] = seed
(local.get 1) (local.set 0))
(i32.load ; return_value = linear_memory[l (seed & 1023) * 4]
(i32.add (local.get 1)
(i32.shl
(i32.and (local.get 0) (i32.const 1023))
(i32.const 2))))
(global.set 0 ; SP = l 4096
(i32.add (local.get 1) (i32.const 4096))))
(global (;0;) (mut i32) (i32.const 66576)) ; SP = 66576 (stack pointer)
(export "compute_senseless" (func 0)))
So the function first decreases SP in 4096 bytes to allocate h
, execute h.c[5] = seed
, push h.c[seed % 2014]
on the (wasm) stack, and then restore the SP.
If two instances of this module are working in parallel on two different WebWorkers, h.c[5] = seed
could cause a data race, because they use the same hard-coded SP as stack base.
To force each instance having it's own non-overlapping region for the stack, I need to modify the stack pointer, but I don't know how to do it. I have some ideas though (using wasi-libc
or emscripten
is overkilling to me):
As I said at the beginning: modifying the
__stack_pointer
; but I can't make it happen (the error is shown above). Besides, I don't know if I have anything else to consider besides making it 16-byte aligned.Clang already defines the function
__wasm_init_tls
, but its function body is empty and no one calls it (the symbol pops up when using the--export-all
flag). I don't know if that function is usable anyhow.I'm aware that clang has support for relocation sections, dynamic linking, etc, but I don't know how to play with all of these options either to achieve my purpose. I don't understand these options good enough.
Or... doing it by hand with a WAST function that sets the pointer and that I have to call manually just after instantiation:
// C
void place_SP(int addr); // Implemented in WAST.
/* To be called from javascript; I'll make sure that this call never happens concurrently, that SP is multiple of 16 (stack base is 16-byte aligned), and that I have space enough to avoid stack overlapping. */
WASM_EXPORT void init_stack(int SP) { place_SP(SP); }
// place_SP.wast
(func $place_SP (param $addr i32)
(global.set 0 (local.get $addr)))
but I'm unsure about what else do I need to write on place_SP.wast
(is it a different module? should I need to specify a (type)
entry?), and how should I modify my makefile
to properly compile place_SP.wast
and link it with senseless.o
to produce a valid senseless.wasm
.
ADDITIONAL INFO:
Makefile:
.SUFFIXES: .wasm
WASI_SDK := <my-wasi-sdk-path> # I'm using wasi-sdk 12.
CXX := $(WASI_SDK)/bin/clang
LD := $(WASI_SDK)/bin/wasm-ld
CPPFLAGS := --sysroot=$(WASI_SDK)/share/wasi-sysroot
CXXFLAGS := -O3 -flto -fno-exceptions
LDFLAGS := --lto-O3 -E --no-entry --import-memory --max-memory=131072 \
--features=atomics,bulk-memory --shared-memory
senseless.wasm: senseless.o
$(LD) --verbose $(LDFLAGS) $^ -o $@
wasm-opt $@ -Oz -o $@
wasm-strip $@
wasm2wat -f --enable-threads --enable-bulk-memory $@ > $*.wast
senseless.o: senseless.cpp
$(CXX) -v -c $(CPPFLAGS) $(CXXFLAGS) $< -o $@
clean:
$(RM) senseless.wasm senseless.o
Compiler logging output (only what I consider to be the relevant stuff):
# Internal compiler call (I have omitted options regarding paths):
"<wasi-sdk>/bin/clang-11" -cc1 -triple wasm32-unknown-wasi -emit-llvm-bc -flto -flto-unit -disable-free -disable-llvm-verifier -discard-value-names -main-file-name senseless.cpp -mrelocation-model static -mframe-pointer=none -fno-rounding-math -mconstructor-aliases -target-cpu generic -fvisibility hidden -debugger-tuning=gdb -v -O3 -fdeprecated-macro -ferror-limit 19 -fgnuc-version=4.2.1 -fcolor-diagnostics -vectorize-loops -vectorize-slp -o senseless.o -x c senseless.cpp
# Memory layout (linker output)
wasm-ld: mem: global base = 1024 # I wonder what are the first 1024 bytes used for.
wasm-ld: mem: __wasm_init_memory_flag offset=1024 size=4 align=4
wasm-ld: mem: static data = 4 # No .rodata/.bss in this example, only the (unused) i32 flag.
wasm-ld: mem: stack size = 65536 # One page (64KiB) for the stack, from 66576 to 1040.
wasm-ld: mem: stack base = 1040 # I wonder what are the bytes from 1028 to 1039 used for.
wasm-ld: mem: stack top = 66576
wasm-ld: mem: heap base = 66576 # Heap from 66576 up to 131072.
wasm-ld: mem: total pages = 2
wasm-ld: mem: max pages = 2
NOTE: I have tried to personalize the stack size by adding -z,stack-size=131072
to LDFLAGS
, so the stack size is two pages long instead of one for example (and increasing --max-memory
to three pages), but none of stack base/size/top change at all.
SOLUTION
I made it work based on the selected answer. I added a file called stack-trick.S
(the .hidden
construct is to avoid place_SP
is exported, which is the default):
.globl place_SP
.hidden place_SP
.globaltype __stack_pointer, i32
place_SP:
.functype place_SP(i32) -> ()
local.set 0
global.set __stack_pointer
end_function
Now, in my senseless.cpp
code, I added:
extern "C" void place_SP(int);
WASM_EXPORT void init_stack(/* some args */)
{
int SP = /* some computation based on parameters */;
place_SP(SP);
}
and in the makefile, I added:
senseless.wasm: stack-trick.o
stack-trick.o: stack-trick.S
$(CXX) -c $< -o $@
and senseless.wast
now have a fancy new exported function (init_stack
) where, in addition, the call to place_SP
has been inlined.
CodePudding user response:
Firstly, if you are doing multi-threading with emscripten then each thread will already have its own stack and its own value for __stack_pointer
. Thats is part of what defines a thread.
If you still want to manipulate the stack yourself (perhaps to have many stacks within a single thread) then you can use the emscripten helper functions stackSave
(to get the SP of the current thread) and stackRestore
(to set the SP of the current thread).
If you are not using emscripten at all, then you are in uncharted territory (what runtime are using using? how are you starting new threads?), but the simplest way to do stack pointer manipulation would be with assembly code. See how emscripten implements these functions:
https://github.com/emscripten-core/emscripten/blob/main/system/lib/compiler-rt/stack_ops.S
So you could do something like this:
.globaltype __stack_pointer, i32
place_SP:
.functype place_SP(i32) -> ()
local.get 0
global.set __stack_pointer
end_function
Then compile that code with clang -c splace_sp.s -o place_sp.o