Home > Blockchain >  How to load all unique words of a specific file in an array and display them C
How to load all unique words of a specific file in an array and display them C

Time:10-30

Stuck at point how to find a unique word in a file, all I've done is to load all the words and store in an array like below:

char *arr=new char[100];
char ch;
fstream my_file("name.txt");
if (!my_file) {
    cout << "No such file";
}
else {
    while(my_file.eof()==0){
        my_file.get(ch);
        arr[i]=ch;
        i  ;
    }
  }
   for(int j=0;j<i;j  ){
    cout<<arr[j];
   }
   my_file.close();

But now I'm confused about how to find unique words. Any guidance.

CodePudding user response:

Your [unnecessarily] heap-allocated array is designed to hold 100 characters. You make it sound like it's supposed to hold many strings. There is a problem there in that your types don't really match. An array of characters can hold a single C-string. You either want an array of std::string, or a two-dimensional array of characters to hold many C-strings.

When you get homework, break the big task into smaller tasks. Break the smaller tasks into even smaller tasks. Keep going until you have a list of trivial steps. Do the trivial stuff, and eventually you have a program.

Your list of steps look like this to me:

  1. open a file
  2. read words from the file
  3. determine if the word is unique or a duplicate
    • Add to an array of words if unique
    • Ignore if a duplicate
  4. Print the array of unique words

You seem stuck on Steps 2 and 3. Step 2 is easily fixed by fixing your types. std::vector<std::string> is an ideal type at your level of learning. std::string is much easier to work with than C-strings (null character terminated character arrays). std::vector is an array-like class that can change its size when needed.

The next problem-solving technique would be to start with what you know. If you feel good about the other steps, write them out. Give yourself a dummy file with like five words in it and make sure you can do everything except identify uniqueness.

Can you open the file, read all five words into an array, and then print the array? If yes, you're already 75% complete.

For the last task, let's open a new file and try to work this out independently from our [mostly] working submission. Hard code an array with five words, and put a duplicate in there. Can you create a new array of just the unique words? What would that look like? Maybe you automatically add the first word to the array, since it is automatically unique by virtue of being the first thing processed. Now, for the remainder of the hard-coded words, how do you tell if the word is unique? Maybe you need to compare it against every element in the unique word array. That's a loop and a comparison. If it compares to be unique, we'll add it to the array. If it compares as a duplicate, we do nothing.

Once that code is working, adapt it to fit into your submission. Writing the code is typically not the hardest part of getting a working program. It's thinking through the algorithm. I understand that at a beginner level, writing the code is also a challenge, but that's why you have homework. But even for a beginner, writing the code should never be as difficult as thinking through the process.

Now, the downside to beginner assignments is usually that the C Standard Library makes them trivial. Here's an example:

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>
#include <unordered_set>

int main(int argc, char* argv[]) {
  if (argc != 2) return 1;

  std::ifstream fin(argv[1]);
  if (!fin) {
    std::cerr << "Error opening " << argv[1] << '\n';
    return 2;
  }

  std::unordered_set<std::string> uniqueWords;
  std::copy(std::istream_iterator<std::string>(fin),
            std::istream_iterator<std::string>(),
            std::inserter(uniqueWords, uniqueWords.begin()));

  for (auto i : uniqueWords) {
    std::cout << i << '\n';
  }
}

This code "skips" a lot of the work you'll be doing manually. std::unordered_set is a data structure that can only contain unique values. Attempting to insert a duplicate fails. std::copy is able to take things directly from the file (using std::istream_iterator) and insert them directly into the std::unordered_set (with the help of std::inserter).

The printing is accomplished with a range-based for loop.

Here's a sample run. Given the following file (you should have provided sample inputs and expected outputs):

cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat bat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat mat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat pat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat vat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat
cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat
rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat rat cat

The output is:

❯ ./a.out name.txt
vat
pat
mat
bat
rat
cat

Which I know to be correct since I created the file.

CodePudding user response:

The below program shows how to count all the unique words in a given text file and then display how many times each of those words occurred:

#include <iostream>
#include <map>
#include <sstream>
#include<fstream>
int main() {
    std::string line, word;
   //this map maps the std::string to their respective count
    std::map<std::string, int> wordCount;
    
    std::ifstream inFile("input.txt");
    
    
    if(inFile)
    {
        while(getline(inFile, line, '\n'))        
        {
            
            std::istringstream ss(line);
            
            while(ss >> word)
            {
                //std::cout<<"word:"<<word<<std::endl;
            wordCount[word]  ;
            }      
        }    
    }
    
    else 
    {
        std::cout<<"file cannot be opened"<<std::endl;
    }
    
    inFile.close();
    std::cout<<"Total unique words are: "<<wordCount.size()<<std::endl;
    for(std::pair<std::string, int> pairElement: wordCount)
    {
        std::cout << pairElement.first <<"-" << pairElement.second<<std::endl;
    }
    return 0;
}

The output of the above program can be seen here.

Note that you don't need an array because std::map would suffice as shown above. The input.txt file can also be found at the link mentioned above.

  • Related