fn f() -> [u8; N] performance dependent on the value of N?-CodePudding

For a function that returns an array by value, such as

const N: usize = 1048576;
fn f(i: usize) -> [u8; N] {
    let mut xs = [0; N];
    xs[i % N] = 1;
    xs
}

Is the performance of this function dependent on the value of N? If I return a larger array, is this function slower?

CodePudding user response：

Yes, the performance depends on the size of N.

I ran this test, to check:

#![feature(test)]

fn f<const N: usize>(i: usize) -> [u8; N] {
    let mut xs = [0; N];
    xs[i % N] = 1;
    xs
}

#[cfg(test)]
mod tests {
    extern crate test;
    use super::*;
    use test::{black_box, Bencher};

    #[bench]
    fn size_1(bencher: &mut Bencher) {
        bencher.iter(|| {
            for i in 0..10000 {
                black_box(f::<1>(i));
            }
        })
    }

    #[bench]
    fn size_10(bencher: &mut Bencher) {
        bencher.iter(|| {
            for i in 0..10000 {
                black_box(f::<10>(i));
            }
        })
    }

    #[bench]
    fn size_100(bencher: &mut Bencher) {
        bencher.iter(|| {
            for i in 0..10000 {
                black_box(f::<100>(i));
            }
        })
    }

    #[bench]
    fn size_1000(bencher: &mut Bencher) {
        bencher.iter(|| {
            for i in 0..10000 {
                black_box(f::<1000>(i));
            }
        })
    }

    #[bench]
    fn size_10000(bencher: &mut Bencher) {
        bencher.iter(|| {
            for i in 0..10000 {
                black_box(f::<10000>(i));
            }
        })
    }

    #[bench]
    fn size_100000(bencher: &mut Bencher) {
        bencher.iter(|| {
            for i in 0..10000 {
                black_box(f::<100000>(i));
            }
        })
    }
}

And this was the output:

running 6 tests
test tests::size_1      ... bench:       2,760 ns/iter ( /- 357)
test tests::size_10     ... bench:      50,241 ns/iter ( /- 3,442)
test tests::size_100    ... bench:      77,154 ns/iter ( /- 36,252)
test tests::size_1000   ... bench:     213,970 ns/iter ( /- 9,205)
test tests::size_10000  ... bench:   1,985,268 ns/iter ( /- 267,095)
test tests::size_100000 ... bench:  66,256,774 ns/iter ( /- 5,980,973)

As you can see, the times do not increase linearly, but jump around a lot. I ran it a few times and got similar results.

A lot of the time will be spent creating an array and setting it to all zeroes, which will take longer the bigger the array. However, setting some arbitrary value in the array could also take time because blocks of the array will need to be copied between caches.

These sorts of factors will like cause a big difference in the performance patterns depending on your hardware. For example, the sizes of different memory caches, and the prefetch strategies of the specific architecture. I ran these tests a very old Mac, and you'll probably find quite different results on a newer machine.

For comparison, I removed the line that changed a value in the array and re-ran the benchmarks. The results are quite different:

test tests::size_1      ... bench:       2,684 ns/iter ( /- 185)
test tests::size_10     ... bench:       5,323 ns/iter ( /- 315)
test tests::size_100    ... bench:      18,718 ns/iter ( /- 573)
test tests::size_1000   ... bench:      98,695 ns/iter ( /- 4,765)
test tests::size_10000  ... bench:     856,343 ns/iter ( /- 57,132)
test tests::size_100000 ... bench:  27,973,952 ns/iter ( /- 5,565,646)

To determine exactly why this is, you would need to run a profiler and see where it spends the time.