What is the fastest way to see if the values from a 2 dimensional array are present in 3 other two d-CodePudding

I have 4 integer two dimensional arrays of the form:

{{1,2,3},{1,2,3},{1,2,3}}

The first 3(A,B,C) are of size/shape [1000][3] and the 4th(D) [57100][3].

The combination of integers in the 3 elements sub-arrays in A,B,C are all unique, while combinations of integers in the 3 elements sub-arrays in D are not.

What I have to do is to find in what array from A,B or C a sub-array from D is present and do something with it after, depending in which array I find it.

I tried using nested for loops with if statements and check each array one at a time (A->B->C) and break out when I find it but it takes to long.

Thank you

CodePudding user response：

As suggested by my comments, let's assume your approach is a naive, look at one item at a time in a for loop to see if a match is found.

Instead of doing that, another approach is to store the arrays as a std::set<std::tuple<int, int, int>> and do a lookup on the sets. Doing this reduces the time complexity from linear to logarithmic. For example, a 1000 item set will have at most 10 tests to finally determine if the item exists instead of having to go through all 1000 items.

Here is a somewhat untested example. Note that the data in the arrays has not been populated (the assumption is that the data has been populated into the arrays):

#include <tuple>
#include <set>
#include <array>
#include <iostream>

using TupleSet = std::set<std::tuple<int, int, int>>;

int A[1000][3];
int B[1000][3];
int C[1000][3];
int D[57100][3];

void printTuple(const std::tuple<int,int,int>& ts)
{
    std::cout << std::get<0>(ts) << "," <<
                 std::get<1>(ts) << "," << 
                 std::get<2>(ts) << "\n";
}

// Initialize the tuple set with the 3 column 2D array of items
void initializeTuple(int pArray[][3], int numItems, TupleSet& ts)
{
    for (int i = 0; i < numItems;   i)
        ts.insert({pArray[i][0], pArray[i][1], pArray[i][2]});
}

int main()
{
   // Assume that A, B, C, and D have been filled in with data
   //...
   static constexpr char *arrayName = "ABC";

   // Declare an array of 3 tuple sets that represent A,B,C 
   std::array<TupleSet, 3> allTuples;

   // This is for the "special" D Array
   TupleSet tD;  

   //Store the data in the tuples
   initializeTuple(A, 1000, allTuples[0]);
   initializeTuple(B, 1000, allTuples[1]);
   initializeTuple(C, 1000, allTuples[2]);
   initializeTuple(D, 57000, tD);

   // Now go through each tuple in D to see if it's
   // in A, B, or C
   for (auto& t : tD)
   {
       // print the tuple we're searching for
       printTuple(t);

       // Now see if it's in A, B, or C   
       for (int i = 0; i < 3;   i)
       {
           auto iter = allTuples[i].find(t);
           if ( iter != allTuples[i].end() )
           {  
               std::cout << "Found a match in Array " << 
                             arrayName[i] << "\n";
               break;
           }
        }  
     }
  }

Note how each 2D array is placed in a tuple set. Also, since a std::set does not store duplicates, the D array will have all of its duplicates removed when placed in the tuple set tD. Thus right there, the number of items that D represents will be minimized.

However, there are some issues you should be aware of:

The initial setup time of the tuples is part of this solution. However the setup time is done only once.
It may be even faster to use a std::unordered_set, but that would require you to write a hash function, and I leave that as an exercise to you.