How can old school multi-threading (no wrapping mutex) can be achieve in Rust? And why is it undefined behavior?
I have to build a highly concurrent physic simulation. I am supposed to do it in C, but I chose Rust (I really needed higher level features).
By using Rust, I should opt for a safe communication between threads, however, I must use a mutable buffer shared between the threads. (actually, I have to implement different techniques and benchmark them)
First approach
Use
Arc<Data>
to share non-mutable state.Use
transmute
to promote&
to&mut
when needed.
It was straightforward but the compiler would prevent this from compiling even with unsafe
block. It is because the compiler can apply optimizations knowing this data is supposedly non-mutable (maybe cached and never updated, not an expert about that).
This kind of optimizations can be stopped by Cell
wrapper and others.
Second approach
Use
Arc<UnsafeCell<Data>>
.Then
data.get()
to access data.
This does not compile either. The reason is that UnsafeCell
is not Send
. The solution is to use SyncUnsafeCell
but it is unstable for the moment (1.66), and the program will be compile and put to production on a machine with only the stable version.
Third approach
Use
Arc<Mutex<Data>>
.At the beginning of each threads:
Lock the mutex.
Keep a
*mut
by coercing a&mut
.Release the mutex.
Use the
*mut
when needed
I haven't tried this one yet, but even if it compiles, is it safe (not talking about data race) as it would be with SyncUnsafeCell
?
PS: The values concurrently mutated are just f32
, there are absolutely no memory allocation or any complex operations happening concurrently. Worst case scenario, I have scrambled some f32
.
CodePudding user response:
Disclaimer: There are probably many ways to solve this, this is just one of them, based on the idea of @Caesar.
Two main points of this post:
- You can use
AtomicU32
to sharef32
between threads without any performance penalty (given an architecture whereu32
is already atomic) - You can use
std::thread::scope
to avoid the overhead ofArc
.
use std::{
fmt::Debug,
ops::Range,
sync::atomic::{AtomicU32, Ordering},
};
struct AtomicF32(AtomicU32);
impl AtomicF32 {
pub fn new(val: f32) -> Self {
Self(AtomicU32::new(val.to_bits()))
}
pub fn load(&self, order: Ordering) -> f32 {
f32::from_bits(self.0.load(order))
}
pub fn store(&self, val: f32, order: Ordering) {
self.0.store(val.to_bits(), order)
}
}
impl Debug for AtomicF32 {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
self.load(Ordering::Relaxed).fmt(f)
}
}
fn perform_action(data: &Vec<AtomicF32>, range: Range<usize>) {
for value_raw in &data[range] {
let mut value = value_raw.load(Ordering::Relaxed);
value *= 2.5;
value_raw.store(value, Ordering::Relaxed);
}
}
fn main() {
let data = (1..=10)
.map(|v| AtomicF32::new(v as f32))
.collect::<Vec<_>>();
println!("Before: {:?}", data);
std::thread::scope(|s| {
s.spawn(|| perform_action(&data, 0..5));
s.spawn(|| perform_action(&data, 5..10));
});
println!("After: {:?}", data);
}
Before: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
After: [2.5, 5.0, 7.5, 10.0, 12.5, 15.0, 17.5, 20.0, 22.5, 25.0]
To demonstrate how leightweight this is, here is what this compiles to:
use std::{
sync::atomic::{AtomicU32, Ordering},
};
pub struct AtomicF32(AtomicU32);
impl AtomicF32 {
fn load(&self, order: Ordering) -> f32 {
f32::from_bits(self.0.load(order))
}
fn store(&self, val: f32, order: Ordering) {
self.0.store(val.to_bits(), order)
}
}
pub fn perform_action(value_raw: &AtomicF32) {
let mut value = value_raw.load(Ordering::Relaxed);
value *= 2.5;
value_raw.store(value, Ordering::Relaxed);
}
.LCPI0_0:
.long 0x40200000
example::perform_action:
movss xmm0, dword ptr [rdi]
mulss xmm0, dword ptr [rip .LCPI0_0]
movss dword ptr [rdi], xmm0
ret
Note that while this contains zero undefined behaviour, it still is the programmers responsibility to avoid read-modify-write race conditions.