Strange performance behavior with vector vs unordered

I am evaluating frequencies using sample data in the snipped file below.

I have noticed that with an unordered list, the evaluation takes less than a second to return a result. However, with a vector, it takes almost a whole minute to evaluate it!

There are several factors I considered:

Size of the data
The data itself

After several experiments, I found that if I took out the 2nd to last data (-6) the performance is almost identical and results are returned for both in less than a second!

However, if I include the -6, the vector evaluation takes too long!

I tried changing the number like -5, -4, etc. and the performance was actually pretty good!

for some reason, only -6 before the last data/number ( 125503) in the file seems to be affecting the vector performance...what's going on?

Note: of course, i tried running them individually too by commenting out the unorderedlist logic and then the vector logic, same behavior

code:

#include <algorithm>
#include <unordered_set>
#include <fstream>
#include <iostream>
#include <vector>

using namespace std;

vector<int> scanFile(ifstream &file) {
    vector<int> scannedFile;
    string str;
    
    while (getline(file, str)) {
        scannedFile.push_back(stoi(str));
    }
    
    return scannedFile;
}

int main() {
    ifstream inputFile;
    vector<int> fileInfo;
    string str = "";
    
    inputFile.open("file.txt");
    
    fileInfo = scanFile(inputFile);
    inputFile.close();
  
    int Occurrences = 0;
    unordered_set<int> unordrdList; //results are immediate, even with -6!
    bool found = false;
    
    unordrdList.insert(Occurrences);
    
    while (!found) {
        for (int n : fileInfo) {
          Occurrences  = n;
          found = unordrdList.find(Occurrences) != unordrdList.end();
          
          unordrdList.insert(Occurrences);
          
          if (found) {
            cout << "Using Unordered_Set: The 2nd showing #: " << to_string(Occurrences) << endl;
            break;
          }
        }
    }
    
    int Occurrnce = 0;
    vector<int> vectr; //result takes too long with -6 present in the file before 2nd to last line!
    bool found2 = false;
    
    vectr.push_back(Occurrnce);
    
    while (!found2) {
        for (int n : fileInfo) {
          Occurrnce  = n;
          found2 = find(vectr.begin(), vectr.end(), Occurrnce) != vectr.end();
          
          vectr.push_back(Occurrnce);
          
          if (found2) {
            cout << "Using Vector: The 2nd showing #: " << to_string(Occurrnce) << endl;
            break;
          }
        }
    }
}

text file:

CodePudding user response：

find in unordered set is on average O(1) whereas find over vector is O(n). Searching in vector is going to take longer.

CodePudding user response：

Not entirely sure that this is the cause, but the find method of List<> on some platforms is implemented to use a different algorithm with few entries than with larger number of entries (normally usually with a speed or mem usage benefit). So this may explain the jump in performance after a certain entry. However if find() is your main application and you do not have the need for non-unique entries, a SET is simply the better choice because as @Rama mentioned, it has a O(1) complexity. The reason is, it likely uses a hash system which also drastically speeds up the check for uniqueness on every insert (which would effectively be a find() call otherwise).