Home > Software engineering >  Strange performance behavior with vector vs unordered_set
Strange performance behavior with vector vs unordered_set

Time:12-01

I am evaluating frequencies using sample data in the snipped file below.

I have noticed that with an unordered list, the evaluation takes less than a second to return a result. However, with a vector, it takes almost a whole minute to evaluate it!

There are several factors I considered:

  • Size of the data
  • The data itself

After several experiments, I found that if I took out the 2nd to last data (-6) the performance is almost identical and results are returned for both in less than a second!

resultimmediate

However, if I include the -6, the vector evaluation takes too long!

resultbad

I tried changing the number like -5, -4, etc. and the performance was actually pretty good!

resultgood

for some reason, only -6 before the last data/number ( 125503) in the file seems to be affecting the vector performance...what's going on?

Note: of course, i tried running them individually too by commenting out the unorderedlist logic and then the vector logic, same behavior

code:

#include <algorithm>
#include <unordered_set>
#include <fstream>
#include <iostream>
#include <vector>

using namespace std;

vector<int> scanFile(ifstream &file) {
    vector<int> scannedFile;
    string str;
    
    while (getline(file, str)) {
        scannedFile.push_back(stoi(str));
    }
    
    return scannedFile;
}

int main() {
    ifstream inputFile;
    vector<int> fileInfo;
    string str = "";
    
    inputFile.open("file.txt");
    
    fileInfo = scanFile(inputFile);
    inputFile.close();
  
    int Occurrences = 0;
    unordered_set<int> unordrdList; //results are immediate, even with -6!
    bool found = false;
    
    unordrdList.insert(Occurrences);
    
    while (!found) {
        for (int n : fileInfo) {
          Occurrences  = n;
          found = unordrdList.find(Occurrences) != unordrdList.end();
          
          unordrdList.insert(Occurrences);
          
          if (found) {
            cout << "Using Unordered_Set: The 2nd showing #: " << to_string(Occurrences) << endl;
            break;
          }
        }
    }
    
    int Occurrnce = 0;
    vector<int> vectr; //result takes too long with -6 present in the file before 2nd to last line!
    bool found2 = false;
    
    vectr.push_back(Occurrnce);
    
    while (!found2) {
        for (int n : fileInfo) {
          Occurrnce  = n;
          found2 = find(vectr.begin(), vectr.end(), Occurrnce) != vectr.end();
          
          vectr.push_back(Occurrnce);
          
          if (found2) {
            cout << "Using Vector: The 2nd showing #: " << to_string(Occurrnce) << endl;
            break;
          }
        }
    }
}

text file:

-5
-2
 1
 14
 7
 5
-14
-4
-5
-12
 7
-5
 17
 5
-13
-12
 15
 22
-5
-6
-12
 20
 4
 2
 17
-1
 18
-7
-1
-17
 11
-12
-5
-2
 9
 2
-6
-17
-1
 2
-3
 15
 19
 9
-8
 13
 1
 11
 16
 3
-16
-7
-15
-15
 12
 16
 18
 1
-9
 16
-9
-19
 17
 1
-15
 13
-9
-8
-1
 7
 17
 13
 15
-17
-3
 12
-10
 5
 4
-16
 15
 3
 19
 1
-2
 19
-16
-11
 4
-10
-8
 13
 13
 19
-6
-19
 1
-4
 18
 15
 16
-18
 12
 3
-9
 8
 3
-9
 11
 4
 8
 1
 6
-10
-3
 15
 16
 15
 12
-14
 4
-14
-7
-3
 14
 4
 3
-6
-7
 11
-2
-18
 17
-3
 14
 18
 6
 8
 18
-13
-7
 17
-18
-10
-15
 13
-16
 5
-10
 15
-8
-4
 14
 13
-10
-14
 7
 5
 13
-4
 12
 3
 14
-7
 6
 12
 17
-18
 3
-6
-3
-14
-1
 13
 19
 3
 9
 4
 11
 12
 3
 1
 14
 7
-17
 12
 11
 6
-7
 4
-11
-1
 6
-19
-10
-16
-2
-19
 7
 9
-3
 5
 12
 15
 8
 11
 1
 8
 15
-9
-9
-8
-10
-3
-16
 7
-17
 9
-5
 16
-15
-4
 15
-5
-25
-6
 1
 4
 17
 19
-13
 17
 7
 19
 2
 4
 10
 16
-9
 19
 13
 3
-10
 9
 5
 1
 18
-11
-14
-4
-5
-13
-7
-12
 2
 3
 6
-16
-1
 13
-10
-4
-1
-3
 9
 22
 4
-18
 17
 11
-21
-17
-18
-8
 12
 6
 15
 12
 10
-7
 18
 10
-8
-10
-18
 11
-17
 25
 15
-9
-19
 38
-3
-6
-23
 34
-25
-5
-12
 25
 14
 17
 30
 3
 9
-8
 16
 21
 21
 4
-12
 23
-13
 9
 3
 6
 13
 15
 6
-7
 15
 12
-10
 13
 12
-7
 13
 4
-9
 18
-10
 5
 8
-7
 2
-14
-12
-1
-6
-16
-18
-3
 1
 12
-6
 15
-17
 15
 13
 6
-15
 26
 1
-21
-3
-21
-4
 14
-1
-19
-11
 13
-10
-14
 2
-17
-23
 25
-16
 8
-30
 17
-44
 13
-19
 5
-9
-12
 10
 14
 17
 24
-15
 7
-61
-103
-18
 21
-22
-6
-9
-6
-9
-7
-15
-17
 4
-1
 10
 8
 14
-4
 15
-16
-6
-16
-15
-12
 15
 10
-15
 3
 15
-10
-17
 3
-5
-12
 4
 3
-37
-7
 14
-18
-8
 6
-11
 14
 30
 7
 23
 39
-12
 11
-8
 11
-1
 56
-28
-12
 7
 34
 47
 21
 88
 61
 18
 10
-99
-287
 26
-858
-209
-61255
 2
-7
-8
 12
 18
 17
-14
-2
 14
 4
 3
 3
-15
 21
 3
 20
 1
 3
-1
-25
-2
 7
-9
 17
-19
-6
-13
 2
-4
-3
-14
-8
 12
-20
 17
-40
 11
 3
-20
-16
-18
-3
-6
-10
 4
 20
-12
-3
 11
 16
-2
 4
-9
 28
-12
 5
 17
-13
 35
 25
 2
 35
-17
-8
 31
 3
-39
-46
 3
-96
-18
-14
-14
-12
 9
 15
 3
 18
-8
 1
-2
-15
 13
 14
-13
-3
 7
-11
-15
-17
-12
-15
-17
 9
 7
-10
-8
-3
 9
 16
 3
 15
-14
-5
-9
 1
-3
 10
 3
-18
-21
-11
 2
-17
-17
-7
-11
 8
 11
-2
-7
-1
-19
-2
-16
 1
 5
 11
 18
 5
 17
 2
-5
-2
 19
-4
 12
 2
 20
-13
-20
-12
 10
 4
 14
-1
 12
 13
-10
-13
-8
-1
 14
 16
 6
 10
 18
 19
 14
-1
 6
-13
-9
 10
-20
 1
 1
-12
 6
 7
-15
-13
-2
 14
-19
-5
-10
-17
 15
-5
 8
 20
 9
 14
-6
 4
-3
 19
 18
-14
-3
-8
 4
 16
 7
-3
-5
 19
-2
 12
 19
 12
-10
-17
-40
 1
-11
 19
-27
-9
 45
-27
 31
 1
-25
 41
 58
-5
-39
-112
 1
-8
-3
-6
-15
-3
-9
 13
 13
-24
 3
-4
 13
-19
-21
 16
-20
-10
-15
-16
-1
-13
-12
-17
 16
 17
 16
-2
-9
-1
-19
 8
 5
-18
 17
-19
-19
-12
 14
-18
-12
 10
 9
-13
-8
 19
 16
 13
-5
 2
 16
-15
 17
 9
 10
 2
-5
 20
 16
-8
-4
 16
 6
-7
-2
-11
 18
 11
-1
 2
 5
 3
 40
-38
 22
-11
-15
-27
 7
-25
 4
-11
-15
-12
 15
-20
-13
 9
-13
 7
-5
 10
 18
 24
 8
 15
-6
-10
-10
-1
-4
-3
-17
 7
-15
 20
-19
-18
-14
-4
-26
-22
-5
 17
 6
 13
 9
 7
 28
-11
-8
 57
-23
-37
-11
-43
 3
-14
 3
-14
 13
-15
 6
-23
 8
 12
-8
 15
 16
 2
-9
-14
 18
 19
-1
-43
-2
-3
-13
 5
-6
-13
-20
 15
 13
-12
 3
 2
 3
 11
 7
-17
-8
-13
 18
-17
-2
-13
 5
 3
 1
-11
 20
 4
 2
-1
-19
-8
-19
 1
-6
-12
 8
-6
-5
 1
 6
 9
-12
 13
 10
-14
 20
-11
 9
 9
-3
-18
 9
-5
 9
 2
-9
-19
-4
-10
 7
-8
 68
 30
 2
-9
 12
-65
 15
-26
-9
-15
-67
-38
-36
-37
-7
 18
-20
-44
 35
-70
-85
 14
 18
-499
 209
-61690
-5
 15
-4
 3
 5
 6
 15
 16
 3
-4
 12
-14
-7
-9
-15
 2
-1
-11
-21
-8
-2
-15
 19
 7
-10
-19
-3
-14
 4
 9
 13
-16
-14
 2
 10
-19
 6
 16
 11
 7
-14
-9
-21
-7
-12
 2
 12
-6
 125503

CodePudding user response:

find in unordered set is on average O(1) whereas find over vector is O(n). Searching in vector is going to take longer.

CodePudding user response:

Not entirely sure that this is the cause, but the find method of List<> on some platforms is implemented to use a different algorithm with few entries than with larger number of entries (normally usually with a speed or mem usage benefit). So this may explain the jump in performance after a certain entry. However if find() is your main application and you do not have the need for non-unique entries, a SET is simply the better choice because as @Rama mentioned, it has a O(1) complexity. The reason is, it likely uses a hash system which also drastically speeds up the check for uniqueness on every insert (which would effectively be a find() call otherwise).

  •  Tags:  
  • c
  • Related