Home > Enterprise >  How to remove duplicates in a vector based on an equality predicate?
How to remove duplicates in a vector based on an equality predicate?

Time:04-20

I have a struct which roughly looks like that:

struct vec3
{
    int x;
    int y;
    int z;

    constexpr bool operator==(const vec3& v) const
    {
        return (x == v.x) && (y == v.y) && (z == v.z);
    }

    constexpr vec3 operator-() const
    {
        return {-x, -y, -z};
    }
};

I then generate a std::vector of vec3 with random values for each coordinates. The function in which it is used requires that no couple of values {v1, v2} in that vector fills v1 == -v2. I obviously need that definition of operator== elsewhere in code, otherwise that problem would be trivial.

I first attempted std::set and std::sort std::unique, but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).

How can I proceed?

Note:

This is somewhat different from Removing duplicates from a non-sortable vector in which pointers are sorted, and also from C how to remove duplicates from vector of Class type? in which the elements are sortable according to some criteria (I think).

CodePudding user response:

The function in which it is used requires that no couple of values {v1, v2} in that vector fills v1 == -v2

but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).

It seems to me that you're trying to solve X, but this is the Y in your XY-problem.

It's fairly simple to implement an ordered comparator that satisfies the equality of -v == v. Simply compare absolute values:

// declare in same namespace as vec3 for ADL
vec3 abs(const vec3& v) {
    return {std::abs(v.x), std::abs(v.y), std::abs(v.z)};
}


struct abs_less {
    template< class T, class U>
    constexpr auto operator()( T&& lhs, U&& rhs ) const
        -> decltype(std::forward<T>(lhs) < std::forward<U>(rhs))
    {
        using std::abs; // for integers
        return abs(lhs) < abs(rhs);
    }
};

This comparator works with both std::set and std::sort std::unique. Example with set:

std::set<vec3, abs_less> example;

Of course, you could overload the operator< and use std::less, but usually I would recommend against unusual operator overloads.

CodePudding user response:

I believe the simplest method would be to use std::unordered_set and exploit its second and third template parameters.

Method

  1. Define a hash function for your class

Define a hasher like that:

struct hasher
{
    std::size_t operator()(const value_type&) const
    {
        return 0;
    }
};

This makes all hash collide and forces std::unordered_set to use KeyEqual to determine objects equality.

  1. Define a key equality predicate

Declare a predicate struct which roughly looks like:

struct key_equal
{
    bool operator()(const value_type& a, const value_type& b) const
    {
        
        return (a == b) || <...>;
    }
};

Where <...> is anything you want to put in your predicate making two values identical ( so here a == b) || -a == b for example).

  1. Erase duplicates

Declare an std::unordered_set which removes duplicates for you:

const std::unordered_set<value_type, hasher, key_equal> s(container.begin(), container.end());
container.assign(s.begin(), s.end());
  1. (alt) Erase duplicates (and conserve order in original vector)

Basically the same, but you check if an element can be inserted in the std::unordered_set, and if does it, remove it. Adapted from @yury's answer in What's the most efficient way to erase duplicates and sort a vector?.

std::unordered_set<value_type, hasher, comparator> s;

const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};

container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());

Generic templated function:

template<typename key_equal_t, bool order_conservative, typename container_t>
void remove_duplicates(container_t& container)
{
    using value_type = typename container_t::value_type;

    // Make all hashes collide so comparator_t::operator() is used to determine element equivalence
    struct hasher
    {
        std::size_t operator()(const value_type&) const
        {
            return 0;
        }
    };

    if constexpr(order_conservative)
    {
        std::unordered_set<value_type, hasher, key_equal_t> s;
        const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};
        container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());
    }
    else
    {
        const std::unordered_set<value_type, hasher, key_equal_t> s(container.begin(), container.end());
        container.assign(s.begin(), s.end());
    }
}

Expects you provide key_equal_t (and a bool known a compile time indicating if you care about element being kept in the same order or not). I did not benchmark any of the two branches in this function so I do not know if one is significantly better than another, though this answer seems show manual insertion may be faster.

Example in this use case:

// "Predicate" indicating if two values are considered duplicates or not
struct key_equal
{
    bool operator()(const vec3& a, const vec3& b) const
    {
        // Remove identical vectors and their opposite
        return (a == b) || (-a == b);
    }
};

remove_duplicates<key_equal, order_conservative>(vec);

Test data:

vec3 v1{1, 1, 0};   // Keep
vec3 v2{0, 1, 0};   // Keep
vec3 v3{1, -1, 0};  // Keep
vec3 v4{-1, -1, 0}; // Remove
vec3 v5{1, 1, 0};   // Remove

std::vector vec{v1, v2, v3, v4, v5};

Test 1: non-order-conservative:

// ...<print vec values>
remove_duplicates<comparator, false>(vec);
// ... <print vec values>

Output (before / after):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0)
(1, -1, 0) (0, 1, 0) (1, 1, 0) 

Test 2: order-conservative:

// ... <print vec values>
remove_duplicates<comparator, true>(vec);
// ... <print vec values>

Output (before / after):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0) 
(1, 1, 0) (0, 1, 0) (1, -1, 0) 
  • Related