c get indices of duplicating rows in 2D array-CodePudding

The task is following: find indices of duplicating rows of 2D array. Rows considered to be duplicated if 2nd and 4th elements of one row are equal to 2nd and 4th elements of another row.The simplest way to do it is something like that:

std::unordered_set<int> result;
for (int i = 0; i < rows_count;   i)
{
    for (int j = i   1; j < rows_count;   j)
    {
        if (arr[i][2] == arr[j][2] && arr[i][4] == arr[j][4])
        {
            result.push_back(j);
        }
    }
}

But if rows_count is very large this algorithm is too slow. So my question is there any way to get needed indices using some data structures (from stl or other) with only single loop (without nested loop)?

CodePudding user response：

You could take advantage of the properties of a `std::unordered_set.

A small helper class will further ease up things.

So, we can store in a class the 2nd and 4th value and use a comparision function to detect duplicates.

The std::unordered_set has, besides the data type, 2 additional template parameters.

A functor for equality and
a functor for calculating a hash function.

So we will add 2 functions to our class an make it a functor for both parameters at the same time. In the below code you will see:

std::unordered_set<Dupl, Dupl, Dupl> dupl{};

So, we use our class additionally as 2 functors.

The rest of the functionality will be done by the std::unordered_set

Please see below one of many potential solutions:

#include <vector>
#include <unordered_set>
#include <iostream>

struct Dupl {
    Dupl() {}
    Dupl(const size_t row, const std::vector<int>& data) : index(row), firstValue(data[2]), secondValue(data[4]){};
    
    size_t index{};
    int firstValue{};
    int secondValue{};

    // Hash function
    std::size_t operator()(const Dupl& d) const noexcept {
        return d.firstValue   (d.secondValue << 8)   (d.index << 16);
    }
    // Comparison
    bool operator()(const Dupl& lhs, const Dupl& rhs) const {
        return  (lhs.firstValue == rhs.firstValue) and (lhs.secondValue == rhs.secondValue);
    }
};

std::vector<std::vector<int>> data{
    {1,   2,   3,  4,   5,  6,  7,  8,  9,  10},              // Index 0
    {2,   3,   4,  5,   6,  7,  8,  9,  10, 11},              // Index 1
    {3,   4,  42,  6,  42,  8,  9,  10, 11, 12},              // Index 2   ***
    {4,   5,   6,  7,   8,  9,  10, 11, 12, 13},              // Index 3
    {5,   6,  42,  8,  42,  10, 11, 12, 13, 14},              // Index 4   ***
    {6,   7,   8,  9,  10,  11, 12, 13, 14, 15},              // Index 5
    {7,   8,   9, 10,  11,  12, 13, 14, 15, 16},              // Index 6
    {8,   9,  10, 11,  12,  13, 14, 15, 16, 17},              // Index 7
    {9,  10,  42, 12,  42,  14, 15, 16, 17, 18},              // Index 8   ***
    {10, 11,  12, 13,  14,  15, 16, 17, 18, 19},              // Index 9

};

int main() {

    std::unordered_set<Dupl, Dupl, Dupl> dupl{};

    // Find the unique rows
    for (size_t i{}; i < data.size();   i)
        dupl.insert({i, data[i]});

    // Show some debug output
    for (const Dupl& d : dupl) {
        std::cout << "\nIndex:\t " << d.index << "\t\tData: ";
        for (const int i : data[d.index]) std::cout << i << ' ';
    }
}