What's the fastest sound way to take ownership of elements of a Vec-CodePudding

Suppose I have the following:

fn into_three_tuple(mut v: Vec<String>) -> (String, String, String) {
  if v.len() == 3 {
    // ???
  } else {
    panic!()
  }
}

What's should I replace ??? with to achieve the best performance?

Possible solutions

Sure, I could do

...
if v.len() == 3 {
  let mut iter = v.into_iter();
  (v.next().unwrap(), v.next().unwrap(), v.next().unwrap())
} else {
...

or similarly

if v.len() == 3 {
  let mut iter = v.into_iter();
  let e2 = v.pop.unwrap();
  let e1 = v.pop.unwrap();
  let e0 = v.pop.unwrap();
  (e0, e1, e2)
} else {
...

Problems with those solutions

Both of these implementations use unwrap, which, if I understand correctly, performs a runtime check. But since we have the v.len() == 3 condition, we know the vector is guaranteed to have 3 elements, so the the runtime check is unnecessary.

Also, the into_iter solution may introduce the additional overhead of creating the iterator, and the pop solution may introduce the additional overhead of decreasing the v's internal len field, which seems silly since v will be dropped immediately after extracting the elements (so we don't care whether its len field is accurate).

Question

Is there some (possibly unsafe) way that's more efficient (e.g., some way to directly take ownership of elements at arbitrary indices)?

Or perhaps the compiler is already smart enough to skip these extraneous operations?

Or do I just have to live with suboptimal performance?

Note

In case you're wondering why I'm obsessing over such a tiny micro-optimization, I'm writing a fairly speed-critical application, and this function will be called a significant amount of times.

CodePudding user response：

There is an operation provided in the standard library to extract a fixed number of items from a vector: converting it to an array.

use std::convert::TryInto;  // needed only if not using 2021 edition / Rust 1.55 or earlier
pub fn into_three_tuple(v: Vec<String>) -> (String, String, String) {
    let three_array: [String; 3] = v.try_into().unwrap();
    let [a, b, c] = three_array;
    (a, b, c)
}

I'm not really familiar with reading x86 assembly, but this does compile down to simpler code with fewer branches. I would generally expect that this is in most cases the fastest way to unpack a three-element vector; if it is reliably slower, then that would be a performance bug in the standard library which should be reported and fixed.

You should also consider using an array [String; 3] instead of a tuple in the rest of your program. The type is shorter to write, and they allow you to use array and slice operations to act on all three strings. Additionally, tuples do not have a guaranteed memory layout, and arrays do (even though practically they are likely to be identical in this case).

Changing the return type to be an array makes the function trivial — possibly useful for the type declaration, but not containing any interesting code:

pub fn into_three_array(v: Vec<String>) -> [String; 3] {
    v.try_into().unwrap()
}

CodePudding user response：

Disclaimer: As mentioned in the comments, you should benchmark to check that any of this actually makes a difference in your program. The fact that you're using many 3-element vectors (which are heap-allocated and therefore comparatively inefficient) shows that you may be over-optimizing, or optimizing at the wrong place. Having said that...

the into_iter solution may introduce the additional overhead of creating the iterator

Note that the "iterator" is a tiny on-stack value entirely transparent to the compiler, which can proceed to inline/eliminate it entirely.

Or perhaps the compiler is already smart enough to skip these extraneous operations?

In many cases, a check for v.len() == <concrete number> is indeed sufficient for the compiler to omit bounds checking because it has proof of the vector size. However, that doesn't appear to work with the approaches you've tried. After modifying the code to std::process::exit() if v.len() != 3 so the only panic is from the runtime checks, the runtime checks (as evidence by calls to panic) are still not removed either with the .pop() or with the into_iter() approach.

Is there some (possibly unsafe) way that's more efficient (e.g., some way to directly take ownership of elements at arbitrary indices)?

Yes. One approach is to use unreachable_unchecked() to avoid the panic where we can prove the calls to next() will succeed:

use std::hint::unreachable_unchecked;

pub fn into_three_tuple(v: Vec<String>) -> (String, String, String) {
    if v.len() == 3 {
        let mut v = v.into_iter();
        unsafe {
            let e0 = v.next().unwrap_or_else(|| unreachable_unchecked());
            let e1 = v.next().unwrap_or_else(|| unreachable_unchecked());
            let e2 = v.next().unwrap_or_else(|| unreachable_unchecked());
            (e0, e1, e2)
        }
    } else {
        panic!()
    }
}

Modifying the code in the same way as the above shows no panic-related code.

Still, that relies on the compiler being smart enough. If you want to ensure the bound checks are not done, Rust unsafe allows you to do that as well. You can use as_ptr() to obtain a raw pointer to the elements stored in the vector, and read them from there directly. You need to call set_len() to prevent the vector from dropping the elements you've moved, but to still allow it to deallocate the storage.

pub fn into_three_tuple(mut v: Vec<String>) -> (String, String, String) {
    if v.len() == 3 {
        unsafe {
            v.set_len(0);
            let ptr = v.as_ptr();
            let e0 = ptr.read();
            let e1 = ptr.add(1).read();
            let e2 = ptr.add(2).read();
            (e0, e1, e2)
        }
    } else {
        panic!("expected Vec of length 3")
    }
}

The generated code again shows no bound check related panics, as there can be none.