Home > Enterprise >  vector find is returning the wrong value
vector find is returning the wrong value

Time:09-11

The purpose of this program is to read strings from a .txt file and put all the non-repeating words into a set. I did this by putting all the words into a vector and then attempted to go through it and add only the unique words to the set and deleting the repeating words from the vector. Here is my complete code with the part I am having trouble with at the bottom.

#include <iostream>
#include <fstream>
#include <set>
#include <vector>
#include <iterator>    
#include <algorithm>

using namespace std;
int main ()
{
//create data types
set<string> non_duplicate;
vector<string> vectorstring;
vector<string>::iterator it;

ifstream file;
//open file return 1 if can't be opened
file.open ("txt.txt");
if (!file.is_open()) return 1;
//make variable for word
string word;
//take words one at a time from file and add to vector/
while (file >> word)
{
    vectorstring.push_back(word);
}
//check vector from repeats and add to set if not
do
{
    string temp = vectorstring[0];
    vectorstring.erase(vectorstring.begin());
    bool duplicate = 0;
    check:
    if (vectorstring.size()  == 0)
    {
        non_duplicate.insert (temp);
        break;
    }
    it = find(vectorstring.begin(), vectorstring.end(), temp);
    if (*it != temp && duplicate != 1)
    {
        non_duplicate.insert (temp);
    }
    else if (*it == temp)
    {
        vectorstring.erase(it);
        duplicate = 1;
        goto check;
    }
} while (!vectorstring.empty());

//output results
cout << "List of non-repeating words: ";
for (auto x = non_duplicate.begin(); x !=non_duplicate.end(); x  )
{
    cout << *x << " ";
}
cout << endl;

This is the bit of code causing me problems. Everytime I get towards the last 3ish elements in the vector the find function and "it" do not give me the correct output. For instance if the temp value being searched for is "ben" and the last of these words has been deleted the value of it does not reset and stays "ben" after going through find making it seem as though there is still a value of "ben" when there is not. I'm not sure why this is happening since it works on every value except those near the end?

do
{
    string temp = vectorstring[0];
    vectorstring.erase(vectorstring.begin());
    bool duplicate = 0;
    if (vectorstring.size()  == 0)
    {
        non_duplicate.insert (temp);
        break;
    }
    check:
    it = find(vectorstring.begin(), vectorstring.end(), temp);
    if (*it != temp && duplicate != 1)
    {
        non_duplicate.insert (temp);
    }
    else if (*it == temp)
    {
        vectorstring.erase(it);
        duplicate = 1;
        goto check;
    }
} while (!vectorstring.empty());

CodePudding user response:

To get a std::set with unique entries from a std::vector you merely have to construct the set. A set contains only unique entries by definition:

#include <set>
#include <vector>
#include <iostream>

int main() {
    std::vector<int> x{1,1,2,2,3,3};
    std::set<int> non_duplicate{x.begin(),x.end()};
    for (const auto n : non_duplicate) std::cout << n << " ";
}

Output:

 1 2 3

Your code is too complicated. I spotted at least one major issue:

string temp = vectorstring[0];
vectorstring.erase(vectorstring.begin());
//....
it = find(vectorstring.begin(), vectorstring.end(), temp);
if (*it != temp && duplicate != 1)

When the first element vectorstring[0] appear only once in the vector, then find will return vectorstring.end() (because the one appearance you erased). Dereferencing the end iterator as in *it != temp invokes undefined behavior.

CodePudding user response:

std::set will store elements only once. You could simply store everything in the set directly without getting a vector involved.

std::string word;
while (file >> word)
{
    non_duplicate.insert(word);
}

Furthermore dereferencing the end iterator is undefined behaviour. If no match is found std::find returns the second iterator, you'll dereference the end iterator of the vector in the if conditions.

Moreover the use of goto should be avoided, since it can easily result in code that is hard to maintain. In your case it wouldn't be hard to rewrite the code to use a second nested loop instead.

Also the loop does make the assumption that the vector is not initially empty.

Here's a rewrite of your loop though that would work:

while(!vectorstring.empty())
{
    std::string temp = std::move(vectorstring[0]); // don't make a copy; we'll erase the object anyways
    vectorstring.erase(vectorstring.begin());

    // clear the duplicates from the vector
    /* Note: We could just use the following more efficient one-liner for this
    vectorstring.erase(std::remove(vectorstring.begin(), vectorstring.end(), temp), vectorstring.end());
    */

    for (auto it = std::find(vectorstring.begin(), vectorstring.end(), temp); it != vectorstring.end(); it = std::find(vectorstring.begin(), vectorstring.end(), temp))
    {
        vectorstring.erase(it);
    }
}
  • Related