How can I change the words in the array to lowercase and sort the unique words alphabetically-CodePudding

I have created a program that must sort the words and search for the unique ones. It should also count the number of occurrences of these words in the list. The list of unique words and the frequency can be stored in dynamic arrays. The program saves the concordance list (list of unique words), along with the frequency of occurrence, in a data file that the user is prompted to provide. I cannot use vectors and any existing data structures such as the list class

This is the question: You must develop a solution (software) that meets these specifications: Ø The program prompts the user for the name of the input text file where the text is stored. The program must print an error message in case errors occur while opening the files. The program must read words and store them into an array of strings. Punctuation characters must be ignored. All alphabetical characters must be converted to lower case characters to eliminate case sensitivity. Ø The program must sort the words and search for the unique ones. It should also count the number of occurrences of these words in the list. The list of unique words and the frequency can be stored in dynamic arrays. Ø The program saves the concordance list (list of unique words), along with the frequency of occurrence, in a data file that the user is prompted to provide. The program must print a confirmation message on the output screen once the data is stored in the file. Ø The program must print the concordance list on the output screen. Here are a few notes about your solution: Ø The program must be designed in a modular fashion. Multiple reusable functions will be implemented to solve the problem (such as a function to search for a string, a function to sort the elements of an array, a function to write the concordance in an output file, a function to return the next word from the input file, etc.). Ø All nonalphabetical characters must be treated as delimiters for separating words in the text file. Ø The size of the concordance (the total number of unique words is unknown at compilation time). Dynamic memory allocation must be used to adjust the size of the concordance list at run-time, as needed. Ø Static array dimensions should be given as symbolic constants. Such definitions should be used to declare arrays. Ø When passing a one dimensional array to a function, pass the dimension as an argument. When passing a 2D array to a function, pass the row dimension as an argument. In the case of 2D arrays, it is necessary to use the symbolic constant for the column dimension in the parameter declaration of a function definition. Ø You are NOT allowed to use any existing data structures (such as the list class) in the solution. Instead, you should create the concordance as a dynamic array of strings (or a dynamic 2D array of characters)

Here is my code:

#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <iomanip>


#define SIZE 100


void findUnique();

using namespace std;

int main()
{
    string array[SIZE];
    int loop = 0;
    string line;
    string letter;
    ifstream file1;
    file1.open("readText.txt");
    if (file1.fail())
    {
        cerr << "error opening the file" << endl;
        exit(-1);
    }
    else if (file1.is_open()) //if the file is open
    {
        while (!file1.eof()) //while the end of file is NOT reached
        {
            file1 >> line;
            getline(file1, line); //get one line from the file

            line.erase(std::remove_if(line.begin(), line.end(), ispunct), line.end());
            array[loop] = line;
            cout << array[loop] << endl; //and output it
            loop  ;

        }
        
    }

    findUnique();
    return (0);
    
}

void findUnique()
{

    string filename;
    cout << "Enter the name of the file" << endl;
    cin >> filename;
    ifstream file;
    file.open(filename);
    if (!file)
    {
        cout << "Error: Failed to open the file.";
    }
    else
    {
        string stringContents;
        int stringSize = 0;

        // find the number of words in the file
        while (file >> stringContents)
            stringSize  ;

        // close and open the file to start from the beginning of the file
        file.close();
        file.open(filename);


        string* mainContents = new string[stringSize];   // dynamic array for strings found
        int* frequency = new int[stringSize];           // dynamic array for frequency
        int uniqueFound = 0;                            // no unique string found

        for (int i = 0; i < stringSize && (file >> stringContents); i  )
        {
            //remove trailing punctuations 
            while (stringContents.size() && ispunct(stringContents.back()))
                stringContents.pop_back();

            // process string found 
            bool found = false;
            for (int j = 0; j < uniqueFound; j  )
                if (mainContents[j] == stringContents) {  // if string already exist
                    frequency[j]   ;     // increment frequency 
                    found = true;
                }
            if (!found) {   // if string not found, add it !  
                mainContents[uniqueFound] = stringContents;
                frequency[uniqueFound  ] = 1;   // and increment number of found
            }
        }
        // display results
        cout << "Word" << setw(20) << "Frequency\n";
        for (int i = 0; i < uniqueFound; i  )
        {
            cout << mainContents[i] << "\t\t" << frequency[i] << endl;
            ofstream file2;
            file2.open("writeText.txt");
            file2 << mainContents[i] << "\t\t" << frequency[i] << endl;
        }
    }
    
}

CodePudding user response：

You can simplify your program by using std::map as shown below. The program below finds the unique words in the input.txt file and keep track of the count correspoinding to each of them.

#include <iostream>
#include <map>
#include <sstream>
#include <fstream>
int main() {
    
    //this map maps each word in the file to their respective count
    std::map<std::string, int> stringCount;
    std::string word, line;
    int count = 0;//this count the total number of words
    
    std::ifstream inputFile("input.txt");
    if(inputFile)
    {
        while(std::getline(inputFile, line))//go line by line
        {
            std::istringstream ss(line);
            while(ss >> word)//go word by word 
            {
                //increment the count 
                stringCount[word]  ;
            }
        }
    }
    else 
    {
        std::cout<<"File cannot be opened"<<std::endl;
    }
    
    inputFile.close();
    
    std::cout<<"Total number of unique words are:"<<stringCount.size()<<std::endl;
    for(std::pair<std::string, int> pairElement: stringCount)
    {
        std::cout<<pairElement.first<<" : "<<pairElement.second<<std::endl;
      
    }
    return 0;
}

The output of the program can be seen here.

The above program prints the number of unique words in the file and also the count(frequency) corresponding to each of the unique words.

CodePudding user response：

I will show you a solution doing full dynamic memory management for everything.

Even for strings.

So, I will not use any library functions and no C containers. Just plain code. This needs many helper functions . . .

The dynamic memory management will be done with new in C . The problem is all the time that we maynot know the size to be allocated in advance.

But this can be handled with an assumed inital array size. And if we see that this array size is not sufficient, then we allocate new memory with a bigger size (often double than before) and copy all elements from the previously used memory to new memory. Then we delete the old memory, and reassign newly allocated memory to the old pointer.

This will lead to many many lines of repetitve code. And hence, std::string and std::vector have been invented.

However. Is is your requirement to use this old and outdated approach. Nowadays it is even strongly discouraged to use new and delete, raw pointers for owned memory and C-Style arrays. In real life, you should never do this.

So, the basic functions. We start with reading the words from a file. We will use a loop with 2 states in it. Either we wait for the start of a word or for the end of a word. That depends on the caracter that we read from the file. If we are in the mode that we found the beginning of a word, then we will copy character by character to a new string.

If we are at the end of a word, then we store the just read word in our array of words, and wait for the next beginning of the word.

This we will do until end of file.

For sorting we implement a standard bubble sort approach, where we will only sort the pointers to the strings, but not the strings itself.

For getting the uniques of words, we use the sorted array.

In a loop, we will check, if a current word is the same as the next word in the array. If it is equal, then we skip, count the duplicate and look for the next word. If not equal, then we store the last word in the new result array. This will then be unique.

We will also store the counter.

So. There are manypossible solutions. Please see below one of them. It is unbelievable 350 lines of code and has not so much to do withg C . Anyway:

#include <iostream>
#include <fstream>

// Some abbreviations
typedef char* String;
typedef String *StringArray;
typedef unsigned int *CounterArray;

// Convert a character to a lower case character
char lowerCase(char c) {
    if (c >= 'A' and c <= 'Z')
        c  = ('a' - 'A');
    return c;
}
// Check if the character is considered to be a part of a word. 
// If you want also numbers and underscores, then uncomment the commented part
bool isWordCharacter(char c) {
    return (c >= 'A' and c <= 'Z') or ((c >= 'a' and c <= 'z')) /*or (c >= '0' and c <= '9') or (c == '_')*/;
}

// Simple comparison of Strings, like C-library function
int stringCompare(String s1, String s2)
{
    // Check, as long as it is equal
    while ((*s1 != '\0' and *s2 != '\0') and *s1 == *s2) {
        s1  ; s2  ;
    }
    // compare the mismatching character and return the result
    return (*s1 == *s2) ? 0 : (*s1 > *s2) ? 1 : -1;
}

// Get the length of a string. Like C-library function
unsigned int stringLength(String string) {
    unsigned int result = 0;
    while (*string  )   result;
    return result;
}

// Duplicate a string
String createAndCopy(String string) {
    unsigned int length = stringLength(string) 1;
    String newString = new char[length];
    for (unsigned int k = 0; k < length;   k)
        newString[k] = string[k];
    return newString;
}

// Get a filename from the user
String getFileName() {
    // Initial estimated size of string
    unsigned int stringSize = 32;
    String fileName = new char[stringSize 1]{};

    // Readuntil '\n'
    char c;
    unsigned int index = 0;
    while (std::cin.get(c) and c!='\n') {

        // Check, if we have enough space
        if (index >= stringSize) {

            // No, create char array with double the space than before
            stringSize *= 2;
            String temp = new char[stringSize 1] {};
            // Copy all data from string to temp
            for (unsigned int k = 0; k < index;   k)
                temp[k] = fileName[k];

            // Delete old string
            delete[] fileName;
            // And reassign new one
            fileName = temp;
        }
        // Store the character 
        fileName[index  ] = c;
    }
    // And the terminating 0
    fileName[index] = '\0';

    // Try to open the file
    std::ifstream ifs(fileName);

    // Check, if it could be opened
    if (not ifs) {
        // Could not be opened. Show error message   
        std::cerr << "\n\n*** Error: could not open file: '" << fileName << "'\n\n";

        // Delet alloocated memory
        delete[] fileName;
        // Indicate a bad result 
        fileName = nullptr;
    }
    return fileName;
}

// Rad all words from a stream to a dynamic array
unsigned int readWordsFromStreamToArray(std::ifstream* is, StringArray* stringArray) {

    const int InitialStringArraySize = 16u;
    const int InitialStringSize = 32u;

    // Define array of strings with initial array size and allocate memory
    unsigned int stringArraySize = InitialStringArraySize;
    unsigned int indexInStringArray = 0;
    *stringArray = new String[stringArraySize];

    // We have alocal string for which we will later allocate memory
    unsigned int stringSize = InitialStringSize;
    unsigned int indexInString = 0;
    String string{};

    // We have 2 states. Either we wait for the beginning of a word or for the end of a word
    bool waitForBeginOfWord{true};

    // As long as we ar in the condition to read characters
    bool readCharactersOK{ true };
    while (readCharactersOK) {
        
        // Read a character and check, if this was ok
        char c; is->get(c);
        readCharactersOK = (bool)(*is);

        // We are in one of 2 states. Wait for begin of word or wait for end of word
        if (waitForBeginOfWord) {
            // As long, as we do not find the begin of a new word
            if (readCharactersOK and isWordCharacter(c)) {
                // Got state "wait for end of word"
                waitForBeginOfWord = false;
                // Now, we have a character from a word. Create a new string
                string = new char[stringSize   1]{};
                // And stat again with index 0
                indexInString = 0;
            }
        }

        // Are we in state wait for end of word?
        if (not waitForBeginOfWord) {
            // Do we have a vild character
            if (readCharactersOK and isWordCharacter(c)) {

                // Now, we have a letter that belongs to a word
                // We want to add this now to our string, but need too check,if it is big enough
                if (indexInString >= stringSize) {

                    // string is bigger than expected, allocate more memory. Double than before
                    stringSize *= 2;

                    String temp = new char[stringSize   1];

                    // Copy old string to new temp string
                    for (unsigned k = 0; k < indexInString;   k)
                        temp[k] = string[k];

                    // Free the memory of the old string
                    delete[] string;

                    // And make the temp string to our current string
                    string = temp;
                }
                string[indexInString  ] = lowerCase(c);
            }
            else {
                // Now we are either at end of file or we have read a none-word character
                // Now, a word is read. Terminate string with a 0
                string[indexInString] = '\0';

                // We want to add the word to the string array.
                // First check, if there is still enough space
                if (indexInStringArray >= stringArraySize) {

                    // We need more memory
                    stringArraySize *= 2;

                    // Create a bigger array
                    StringArray temp = new String[stringArraySize];

                    // Copy all strings from the old array to this temporaray array
                    for (unsigned int k = 0; k < indexInStringArray;   k)
                        temp[k] = (*stringArray)[k];

                    // Delete old memory
                    delete[] (*stringArray);

                    // And assign newly created memory
                    *stringArray = temp;
                }
                // Store next word in array
                (*stringArray)[indexInStringArray  ] = string;

                // Next time, we need to wait for the begin of a word again.
                waitForBeginOfWord = true;
            }
        }
    } // Return number of words
    return indexInStringArray;
}
// Standard buuble sort algorithm. Only pointers will be exchanged
void bubbleSort(StringArray* stringArray, unsigned int numberOfWords) {

    // Check whether we still need to sort
    bool sorted = false; 

    // Abbreviation
    StringArray ptr = *stringArray;

    // As long as we need to sort
    while (!sorted) // repeat until no more swaps
    {
        sorted = true; // Assume everything sorted
        for (unsigned int j = 0; j < numberOfWords-1; j  ) 
        {
            if (stringCompare(*(ptr   j),*(ptr   j   1))==1) 
            {
                // Swap 2 pointers
                String temp = *(ptr   j);
                *(ptr   j) = *(ptr   j   1);
                *(ptr   j   1) = temp;
                // we swapped, so keep sorting
                sorted = false; 
            }
        }
    }
}

// Get unique words from a sorted array of words
unsigned int makeUuniqueAndCount(StringArray* stringArray, unsigned int numberOfWords, StringArray* uniqueStringArray, CounterArray* counterArray) {

    // first allocate memory for the resulting array
    *uniqueStringArray = new String[numberOfWords]{};
    *counterArray = new unsigned int[numberOfWords] {};

    // Indices for the resulting arrays
    unsigned int uniqueStringArrayIndex = 0;

    // Here we count the frequency of the words. A words always exists at leastr once
    unsigned int wordCounter = 1;

    // For all words in the original array
    for (unsigned int k=0; k < numberOfWords - 1;   k) {

        // List is sorted. So, 2 identical words would follow. Check this
        if (stringCompare((*stringArray)[k], (*stringArray)[k   1]) != 0) {

            // Differentword found. Duplicate search for this word is over
            // Create a new string and copy old word to new array
            String s = createAndCopy((*stringArray)[k]);
            (*uniqueStringArray)[uniqueStringArrayIndex] = s;

            // Store the word counter for this word
            (*counterArray)[uniqueStringArrayIndex] = wordCounter;

            // We start now to count from the beginning
            wordCounter = 1;
              uniqueStringArrayIndex;
        }
        else {
            // Duplicate word found, increase word counter
              wordCounter;
        }
    }
    // And now, the original allocated array for the duplicate words are too big.
    // Allocate real size and recopy.
    if (uniqueStringArrayIndex != numberOfWords) {

        // Get new, exact fitting temp array
        StringArray temp1 = new String[uniqueStringArrayIndex];
        // Copy all words into temp
        for (unsigned int k = 0; k < uniqueStringArrayIndex;   k)
            temp1[k] = (*uniqueStringArray)[k];
        // delete olf content
        delete[](*uniqueStringArray);
        // And reassign
        (*uniqueStringArray) = temp1;

        // Get new, exact fitting temp array
        CounterArray temp2 = new unsigned int[uniqueStringArrayIndex];
        // Copy all words into temp
        for (unsigned int k = 0; k < uniqueStringArrayIndex;   k)
            temp2[k] = (*counterArray)[k];
        // delete olf content
        delete[](*counterArray);
        // And reassign
        (*counterArray) = temp2;
    }
    return uniqueStringArrayIndex;
}

int main() {

    // Get a file name
    String fileName = getFileName();

    // If that worked and we got a valid file name
    if (fileName) {

        // Try to open the file
        std::ifstream ifs(fileName);

        // If that worked
        if (ifs) {
          
            // Define our arrays
            StringArray stringArray{};
            unsigned int numberOfWords = readWordsFromStreamToArray(&ifs, &stringArray);

            // Show result
            std::cout << "\n\nRaw word list:--------------------------------------------------\n";
            for (unsigned int i = 0; i < numberOfWords;   i) {
                std::cout << i   1 << '\t' << stringArray[i] << '\n';
            }

            // Sort
            bubbleSort(&stringArray, numberOfWords);
            std::cout << "\n\nSorted word list:--------------------------------------------------\n";
            // Sow result
            for (unsigned int i = 0; i < numberOfWords;   i) {
                std::cout << i   1 << '\t' << stringArray[i] << '\n';
            }

            // Getting unique strings and count
            StringArray uniqueStringArray{};
            CounterArray counterArray{};
            unsigned int numberOfUniqes = makeUuniqueAndCount(&stringArray, numberOfWords, &uniqueStringArray, &counterArray);
            // Show result
            std::cout << "\n\nUnique word list and count:--------------------------------------------------\n";
            for (unsigned int i = 0; i < numberOfUniqes;   i) {
                std::cout << i   1 << '\t' << uniqueStringArray[i] << "\t --> " << counterArray[i] << '\n';
            }

            // Delete all dynamically allocated memory
            for (unsigned int k = 0; k < numberOfWords;   k) {
                delete[] stringArray[k];
            }
            for (unsigned int k = 0; k < numberOfUniqes;   k) {
                delete[] uniqueStringArray[k];
            }
            delete[] stringArray;
            delete[] uniqueStringArray;
            delete[] counterArray;
        }
        delete[] fileName;
    }
}

In C you would do it like the below:

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <map>
#include <regex>
#include <algorithm>
#include <cctype>

const std::regex re{ R"(\w )" };

int main() {
   
    // Tell user what to do: Input a file name
    std::cout << "Please eneter a filename:\n";
    // Read the filename
    if (std::string fileName{}; std::getline(std::cin, fileName)) {

        // Open the file, and check, if it could be opened
        if (std::ifstream inputFileStream{ fileName }; inputFileStream) {

            // Read the complete file into a string
            std::string data(std::istreambuf_iterator<char>(inputFileStream), {});

            // Make everything lowe case
            std::transform(data.begin(), data.end(), data.begin(), [](char c) {return (char)std::tolower(c); });

            // Get all words from the string
            std::vector<std::string> words(std::sregex_token_iterator(data.begin(), data.end(), re), {});

            // Define a counter for the words
            std::map<std::string, size_t> counter{};

            // Show them and count them
            std::cout << "\n\n\nWord list:\n\n";
            for (const std::string& word : words) {
                std::cout << word << '\n';
                counter[word]   ;
            }

            // Show sorted unique list with counts
            std::cout << "\n\n\nUniqe counted word list:\n\n";
            for (const auto& [word, count] : counter)
                std::cout << word << "\t --> " << count << '\n';
        }
        else std::cerr << "\n\n*** Error. Could not open file '" << fileName << "'\n\n";
    }
}