How to remove duplicates in a vector based on an equality predicate?-CodePudding

I have a struct which roughly looks like that:

struct vec3
{
    int x;
    int y;
    int z;

    constexpr bool operator==(const vec3& v) const
    {
        return (x == v.x) && (y == v.y) && (z == v.z);
    }

    constexpr vec3 operator-() const
    {
        return {-x, -y, -z};
    }
};

I then generate a std::vector of vec3 with random values for each coordinates. The function in which it is used requires that no couple of values {v1, v2} in that vector fills v1 == -v2. I obviously need that definition of operator== elsewhere in code, otherwise that problem would be trivial.

I first attempted std::set and std::sort std::unique, but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).

How can I proceed?

Note:

This is somewhat different from Removing duplicates from a non-sortable vector in which pointers are sorted, and also from C how to remove duplicates from vector of Class type? in which the elements are sortable according to some criteria (I think).

CodePudding user response：

The function in which it is used requires that no couple of values {v1, v2} in that vector fills v1 == -v2

but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).

It seems to me that you're trying to solve X, but this is the Y in your XY-problem.

It's fairly simple to implement an ordered comparator that satisfies the equality of -v == v. Simply compare absolute values:

// declare in same namespace as vec3 for ADL
vec3 abs(const vec3& v) {
    return {std::abs(v.x), std::abs(v.y), std::abs(v.z)};
}


struct abs_less {
    template< class T, class U>
    constexpr auto operator()( T&& lhs, U&& rhs ) const
        -> decltype(std::forward<T>(lhs) < std::forward<U>(rhs))
    {
        using std::abs; // for integers
        return abs(lhs) < abs(rhs);
    }
};

This comparator works with both std::set and std::sort std::unique. Example with set:

std::set<vec3, abs_less> example;

Of course, you could overload the operator< and use std::less, but usually I would recommend against unusual operator overloads.

CodePudding user response：

I believe the simplest method would be to use std::unordered_set and exploit its second and third template parameters.

Method

Define a hash function for your class

Define a hasher like that:

struct hasher
{
    std::size_t operator()(const value_type&) const
    {
        return 0;
    }
};

This makes all hash collide and forces std::unordered_set to use KeyEqual to determine objects equality.

Define a key equality predicate

Declare a predicate struct which roughly looks like:

struct key_equal
{
    bool operator()(const value_type& a, const value_type& b) const
    {
        
        return (a == b) || <...>;
    }
};

Where <...> is anything you want to put in your predicate making two values identical ( so here a == b) || -a == b for example).

Erase duplicates

Declare an std::unordered_set which removes duplicates for you:

const std::unordered_set<value_type, hasher, key_equal> s(container.begin(), container.end());
container.assign(s.begin(), s.end());

(alt) Erase duplicates (and conserve order in original vector)

Basically the same, but you check if an element can be inserted in the std::unordered_set, and if does it, remove it. Adapted from @yury's answer in What's the most efficient way to erase duplicates and sort a vector?.

std::unordered_set<value_type, hasher, comparator> s;

const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};

container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());

Generic templated function:

template<typename key_equal_t, bool order_conservative, typename container_t>
void remove_duplicates(container_t& container)
{
    using value_type = typename container_t::value_type;

    // Make all hashes collide so comparator_t::operator() is used to determine element equivalence
    struct hasher
    {
        std::size_t operator()(const value_type&) const
        {
            return 0;
        }
    };

    if constexpr(order_conservative)
    {
        std::unordered_set<value_type, hasher, key_equal_t> s;
        const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};
        container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());
    }
    else
    {
        const std::unordered_set<value_type, hasher, key_equal_t> s(container.begin(), container.end());
        container.assign(s.begin(), s.end());
    }
}

Expects you provide key_equal_t (and a bool known a compile time indicating if you care about element being kept in the same order or not). I did not benchmark any of the two branches in this function so I do not know if one is significantly better than another, though this answer seems show manual insertion may be faster.

Example in this use case:

// "Predicate" indicating if two values are considered duplicates or not
struct key_equal
{
    bool operator()(const vec3& a, const vec3& b) const
    {
        // Remove identical vectors and their opposite
        return (a == b) || (-a == b);
    }
};

remove_duplicates<key_equal, order_conservative>(vec);

Test data:

vec3 v1{1, 1, 0};   // Keep
vec3 v2{0, 1, 0};   // Keep
vec3 v3{1, -1, 0};  // Keep
vec3 v4{-1, -1, 0}; // Remove
vec3 v5{1, 1, 0};   // Remove

std::vector vec{v1, v2, v3, v4, v5};

Test 1: non-order-conservative:

// ...<print vec values>
remove_duplicates<comparator, false>(vec);
// ... <print vec values>

Output (before / after):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0)
(1, -1, 0) (0, 1, 0) (1, 1, 0)

Test 2: order-conservative:

// ... <print vec values>
remove_duplicates<comparator, true>(vec);
// ... <print vec values>

Output (before / after):

(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0) 
(1, 1, 0) (0, 1, 0) (1, -1, 0)