Home > Mobile >  Rust sorting uses surprisingly few comparaisons
Rust sorting uses surprisingly few comparaisons

Time:09-07

I am currently learning Rust (using the Rust book), and one page mentions counting the number of times the sorting key was used while sorting an array. I modified the code in order to count this for arbitrary sizes, and here is the code :

fn main() {
    const MAX: i32 = 10000;
    for n in 1..MAX {
        let mut v: Vec<i32> = (1..n).collect();
        let mut ops = 0;
        
        v.sort_by(|x, y| {
            ops  = 1;
            x.cmp(y)
        });
        
        if n-2 >= 0 {
            assert_eq!(n-2, ops);
        }
        // println!("A list of {n} elements is sorted in {ops} operations");

    }
}

However, it seems that in order to sort a vector of n elements, Rust only needs n-2 comparaisons (the code above runs without panicking). How can this be possible ? Aren't sorts supposed to be in O(n*log(n)) ? Is it because Rust somehow "noticed" that my input vector was already sorted ?

Even in that case, how can a vector of length 2 can be sorted without any comparaisons ? Shouldn't it at least be n-1 ?

CodePudding user response:

The biggest misconseption you have, I think, is:

fn main() {
    const SIZE: i32 = 5;
    let v: Vec<i32> = (1..SIZE).collect();
    println!("{}", v.len());
}
4

The range 1..SIZE does not include SIZE and contains SIZE-1 elements. Further, it will already be sorted, so it's as simple as iterating through it once.

See here:

fn main() {
    const SIZE: i32 = 5;

    let mut v: Vec<i32> = (1..SIZE).collect();
    let mut ops = 0;

    v.sort_by(|x, y| {
        ops  = 1;
        let result = x.cmp(y);
        println!(" - cmp {} vs {} => {:?}", x, y, result);
        result
    });

    println!("Total comparisons: {}", ops);
}
 - cmp 4 vs 3 => Greater
 - cmp 3 vs 2 => Greater
 - cmp 2 vs 1 => Greater
Total comparisons: 3

it seems that in order to sort a vector of n elements, Rust only needs n-2 comparaisons

That is incorrect. In order to sort an already sorted vector (which yours are), Rust needs n-1 comparisons. It doesn't detect that, that's just an inherent property of the mergesort implementation that Rust uses.

If it isn't already sorted, it will be more:

fn main() {
    let mut v: Vec<i32> = vec![2, 4, 1, 3];

    let mut ops = 0;

    v.sort_by(|x, y| {
        ops  = 1;
        let result = x.cmp(y);
        println!(" - cmp {} vs {} => {:?}", x, y, result);
        result
    });

    println!("Total comparisons: {}", ops);
}
 - cmp 3 vs 1 => Greater
 - cmp 1 vs 4 => Less
 - cmp 3 vs 4 => Less
 - cmp 1 vs 2 => Less
 - cmp 3 vs 2 => Greater
Total comparisons: 5

CodePudding user response:

FYI sort_by:

    pub fn sort_by<F>(&mut self, mut compare: F)
    where
        F: FnMut(&T, &T) -> Ordering,
    {
        merge_sort(self, |a, b| compare(a, b) == Less);
    }

and it actually invokes merge_sort:

/// This merge sort borrows some (but not all) ideas from TimSort, which is described in detail
/// [here](https://github.com/python/cpython/blob/main/Objects/listsort.txt).
///
/// The algorithm identifies strictly descending and non-descending subsequences, which are called
/// natural runs. There is a stack of pending runs yet to be merged. Each newly found run is pushed
/// onto the stack, and then some pairs of adjacent runs are merged until these two invariants are
/// satisfied:
///
/// 1. for every `i` in `1..runs.len()`: `runs[i - 1].len > runs[i].len`
/// 2. for every `i` in `2..runs.len()`: `runs[i - 2].len > runs[i - 1].len   runs[i].len`
///
/// The invariants ensure that the total running time is *O*(*n* \* log(*n*)) worst-case.
#[cfg(not(no_global_oom_handling))]
fn merge_sort<T, F>(v: &mut [T], mut is_less: F)

how can a vector of length 2 be sorted without any comparisons? Shouldn't it at least be n-1?

(1..2) returns a slice of length 1 (start from 1, but less than 2). So, when n == 2 in your code, please note that the length of the vector is one.

Let me demonstrate how it will actually go in the merge_sort if the input is a slice shorter than or equal to 2.

    // MAX_INSERTION: 20
    if len <= MAX_INSERTION {
        // if the len is less than 1, it won't use `is_less` closure to let you count the cmp.
        if len >= 2 {
            for i in (0..len - 1).rev() {
                insert_head(&mut v[i..], &mut is_less); // <- go into `insert_head`.
            }
        }
        return;
    }
fn insert_head<T, F>(v: &mut [T], is_less: &mut F)
where
    F: FnMut(&T, &T) -> bool,
{
    if v.len() >= 2 && is_less(&v[1], &v[0]) // <- here it uses the closure to make comparison.

So if your input is less than 21, short arrays will get sorted in place via insertion sort to avoid allocations.

  • Related