Home > OS >  How to remove duplicate in an array in Rust?
How to remove duplicate in an array in Rust?

Time:07-10

I have generated an array of numbers. I would like to remove the duplicates. In javascript I can just use [...new Set(arr)] and get the job done.

In Rust, I havn't found a simple way to achieve that so far.

I've written:

use rand::{thread_rng, Rng};
use itertools::Itertools;

fn main() {
    let mut arr:Vec<u8> = Vec::new();
    for _ in 0..10 {
        arr.push(thread_rng().gen_range(0..10))
    }
    println!("random {:?}", arr);
    arr.iter().unique();
    println!("unique {:?}", arr);
}

The output are:

random [7, 0, 3, 6, 7, 7, 1, 1, 8, 6]
unique [7, 0, 3, 6, 7, 7, 1, 1, 8, 6]

So I've tried to get the "no duplicate" result in another variable:

let res = &arr.iter().unique();

The result was:

Unique { iter: UniqueBy { iter: Iter([1, 2, 0, 0, 7, 0, 2, 2, 1, 6]), used: {} } }

Also, it seems I can't sort the array before performing the removal of duplicate. This code returns an error: no method named 'iter' found for unit type '()' in the current scope method not found in '()'.

arr.sort().iter().unique();

Also, maybe there is a way to achieve the sort unique value output without external crates?

CodePudding user response:

Using the standard library

Usually, sorting an array is an ok way of deduplicating it, but, except if you are using a radix sort (which is not the sorting method Rust uses), it's asymptotically better to what you would do in JS. Here is the Rust equivalent:

let a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
let uniqued_vector = a
    .into_iter()
    .collect::<HashSet<_>>()
    .into_iter()
    .collect::<Vec<_>>();

This will turn your array into an iterator, then that iterator into a HashSet (which will deduplicate it), then back again into an iterator form, then finally into an array.

See it on the playground.


If you wonder why we have to have to go back and forth between these iterator representations, it's because they are the "interface" Rust uses to transform any datatype into any other datatype, very efficiently and while allowing you to perform some operations along the way easily. Here we don't actually need to do anything more than the conversion so that's why it may seem a little bit verbose.

Using the itertools crate

The itertools crate provides utilities to work on iterators (the same that we use as an interface to convert between datatypes). However, a peculiarity of iterators is that they are lazy, in a way, in the sense that they, in themselves, are not a datatype used to store information. They only represent operations performed, through the iterable interface, on a collection. For this reason, you actually need to transform an iterator back into a usable collection (or consume it in any way), otherwise it will do nothing (literally).

So the correct version of your code would probably be

let a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
let uniqued_vector = a
    .into_iter()
    .unique()
    .collect::<Vec<_>>();

You don't need to sort anything because, internally, .unique() works pretty much like the first implementation.

Sorting the array

As said earlier, sorting the array is fine, so you might still want to do that. However, unlike previous solutions, this won't involve only iterators because you can't sort an iterator (there is no such method provided by the Iterator trait, nor by the actual type produced by a_vector.into_iter())! However, once you have sorted the array, you may want to deduplicate it, that is, remove consecutive repetitions, which is also not provided by the Iterator trait. However, both of these are actually simply provided by Vec, so the solution is simply:

let mut a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
a_vector.sort();
a_vector.dedup();

And then a_vector contains unique elements.

Note that this is only true if you use only the standard library. Itertools provides both a sorted method and a dedup one, so with itertools you could do:

let a_vector = vec![7, 0, 3, 6, 7, 7, 1, 1, 8, 6];
let uniqued_vector = a_vector
    .into_iter()
    .sorted()
    .dedup()
    .collect::<Vec<_>>();

But at this point you'd be better off using .unique().


If you wonder what the difference between .iter() and .into_iter(), see this question.

CodePudding user response:

Inspired from the example of retain(), we can rely on an intermediate sequence of booleans (see soution_1() below).

I don't like very much the idea of an intermediate storage, but I guess (I should have benchmarked, but I did not) that this simple vector of booleans is less expensive than creating a HashSet or sorting.

For an in-place solution, I'm afraid we have to write the algorithm by ourselves. The trick relies on split_at_mut() in order to avoid a multiple-borrows issue (see solution_2() below).

use rand::{thread_rng, Rng};

fn solution_1(mut arr: Vec<u8>) -> Vec<u8> {
    println!("random {:?}", arr);
    let keep: Vec<_> = arr
        .iter()
        .enumerate()
        .map(|(i, x)| !arr[0..i].contains(x))
        .collect();
    let mut keep_iter = keep.iter();
    arr.retain(|_| *keep_iter.next().unwrap());
    println!("unique {:?}", arr);
    arr
}

fn solution_2(mut arr: Vec<u8>) -> Vec<u8> {
    println!("random {:?}", arr);
    let mut kept = 0;
    for i in 0..arr.len() {
        let (head, tail) = arr.split_at_mut(i);
        let x = tail.first_mut().unwrap();
        if !head[0..kept].contains(x) {
            if kept != i {
                std::mem::swap(&mut head[kept], x);
            }
            kept  = 1;
        }
    }
    arr.resize(kept, Default::default());
    println!("unique {:?}", arr);
    arr
}

fn main() {
    let mut arr: Vec<u8> = Vec::new();
    for _ in 0..20 {
        arr.push(thread_rng().gen_range(0..10))
    }
    assert_eq!(solution_1(arr.clone()), solution_2(arr));
}
/*
random [2, 3, 6, 8, 4, 1, 1, 9, 1, 9, 1, 6, 2, 5, 5, 4, 0, 0, 5, 4]
unique [2, 3, 6, 8, 4, 1, 9, 5, 0]
random [2, 3, 6, 8, 4, 1, 1, 9, 1, 9, 1, 6, 2, 5, 5, 4, 0, 0, 5, 4]
unique [2, 3, 6, 8, 4, 1, 9, 5, 0]
*/

CodePudding user response:

Several things:

  1. .unique() returns an iterator. You have to turn it back into a vector with .collect(). This section of the Rust book contains the basic rules around using iterators

  2. .sort() modifies the vector in place. Unit type means an expression has no type, functions can sometimes not have any output and just have side effects (like modyfing something in place). Try:

    arr.sort();

    let uniques = arr.iter().unique().collect()

  • Related