I have created a program that must sort the words and search for the unique ones. It should also count the number of occurrences of these words in the list. The list of unique words and the frequency can be stored in dynamic arrays. The program saves the concordance list (list of unique words), along with the frequency of occurrence, in a data file that the user is prompted to provide. I cannot use vectors and any existing data structures such as the list class
This is the question: You must develop a solution (software) that meets these specifications: Ø The program prompts the user for the name of the input text file where the text is stored. The program must print an error message in case errors occur while opening the files. The program must read words and store them into an array of strings. Punctuation characters must be ignored. All alphabetical characters must be converted to lower case characters to eliminate case sensitivity. Ø The program must sort the words and search for the unique ones. It should also count the number of occurrences of these words in the list. The list of unique words and the frequency can be stored in dynamic arrays. Ø The program saves the concordance list (list of unique words), along with the frequency of occurrence, in a data file that the user is prompted to provide. The program must print a confirmation message on the output screen once the data is stored in the file. Ø The program must print the concordance list on the output screen. Here are a few notes about your solution: Ø The program must be designed in a modular fashion. Multiple reusable functions will be implemented to solve the problem (such as a function to search for a string, a function to sort the elements of an array, a function to write the concordance in an output file, a function to return the next word from the input file, etc.). Ø All nonalphabetical characters must be treated as delimiters for separating words in the text file. Ø The size of the concordance (the total number of unique words is unknown at compilation time). Dynamic memory allocation must be used to adjust the size of the concordance list at run-time, as needed. Ø Static array dimensions should be given as symbolic constants. Such definitions should be used to declare arrays. Ø When passing a one dimensional array to a function, pass the dimension as an argument. When passing a 2D array to a function, pass the row dimension as an argument. In the case of 2D arrays, it is necessary to use the symbolic constant for the column dimension in the parameter declaration of a function definition. Ø You are NOT allowed to use any existing data structures (such as the list class) in the solution. Instead, you should create the concordance as a dynamic array of strings (or a dynamic 2D array of characters)
Here is my code:
#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <iomanip>
#define SIZE 100
void findUnique();
using namespace std;
int main()
{
string array[SIZE];
int loop = 0;
string line;
string letter;
ifstream file1;
file1.open("readText.txt");
if (file1.fail())
{
cerr << "error opening the file" << endl;
exit(-1);
}
else if (file1.is_open()) //if the file is open
{
while (!file1.eof()) //while the end of file is NOT reached
{
file1 >> line;
getline(file1, line); //get one line from the file
line.erase(std::remove_if(line.begin(), line.end(), ispunct), line.end());
array[loop] = line;
cout << array[loop] << endl; //and output it
loop ;
}
}
findUnique();
return (0);
}
void findUnique()
{
string filename;
cout << "Enter the name of the file" << endl;
cin >> filename;
ifstream file;
file.open(filename);
if (!file)
{
cout << "Error: Failed to open the file.";
}
else
{
string stringContents;
int stringSize = 0;
// find the number of words in the file
while (file >> stringContents)
stringSize ;
// close and open the file to start from the beginning of the file
file.close();
file.open(filename);
string* mainContents = new string[stringSize]; // dynamic array for strings found
int* frequency = new int[stringSize]; // dynamic array for frequency
int uniqueFound = 0; // no unique string found
for (int i = 0; i < stringSize && (file >> stringContents); i )
{
//remove trailing punctuations
while (stringContents.size() && ispunct(stringContents.back()))
stringContents.pop_back();
// process string found
bool found = false;
for (int j = 0; j < uniqueFound; j )
if (mainContents[j] == stringContents) { // if string already exist
frequency[j] ; // increment frequency
found = true;
}
if (!found) { // if string not found, add it !
mainContents[uniqueFound] = stringContents;
frequency[uniqueFound ] = 1; // and increment number of found
}
}
// display results
cout << "Word" << setw(20) << "Frequency\n";
for (int i = 0; i < uniqueFound; i )
{
cout << mainContents[i] << "\t\t" << frequency[i] << endl;
ofstream file2;
file2.open("writeText.txt");
file2 << mainContents[i] << "\t\t" << frequency[i] << endl;
}
}
}
CodePudding user response:
You can simplify your program by using std::map
as shown below. The program below finds the unique words in the input.txt file and keep track of the count correspoinding to each of them.
#include <iostream>
#include <map>
#include <sstream>
#include <fstream>
int main() {
//this map maps each word in the file to their respective count
std::map<std::string, int> stringCount;
std::string word, line;
int count = 0;//this count the total number of words
std::ifstream inputFile("input.txt");
if(inputFile)
{
while(std::getline(inputFile, line))//go line by line
{
std::istringstream ss(line);
while(ss >> word)//go word by word
{
//increment the count
stringCount[word] ;
}
}
}
else
{
std::cout<<"File cannot be opened"<<std::endl;
}
inputFile.close();
std::cout<<"Total number of unique words are:"<<stringCount.size()<<std::endl;
for(std::pair<std::string, int> pairElement: stringCount)
{
std::cout<<pairElement.first<<" : "<<pairElement.second<<std::endl;
}
return 0;
}
The output of the program can be seen here.
The above program prints the number of unique words in the file and also the count(frequency) corresponding to each of the unique words.
CodePudding user response:
I will show you a solution doing full dynamic memory management for everything.
Even for strings.
So, I will not use any library functions and no C containers. Just plain code. This needs many helper functions . . .
The dynamic memory management will be done with new in C . The problem is all the time that we maynot know the size to be allocated in advance.
But this can be handled with an assumed inital array size. And if we see that this array size is not sufficient, then we allocate new memory with a bigger size (often double than before) and copy all elements from the previously used memory to new memory. Then we delete the old memory, and reassign newly allocated memory to the old pointer.
This will lead to many many lines of repetitve code. And hence, std::string
and std::vector
have been invented.
However. Is is your requirement to use this old and outdated approach. Nowadays it is even strongly discouraged to use new and delete, raw pointers for owned memory and C-Style arrays. In real life, you should never do this.
So, the basic functions. We start with reading the words from a file. We will use a loop with 2 states in it. Either we wait for the start of a word or for the end of a word. That depends on the caracter that we read from the file. If we are in the mode that we found the beginning of a word, then we will copy character by character to a new string.
If we are at the end of a word, then we store the just read word in our array of words, and wait for the next beginning of the word.
This we will do until end of file.
For sorting we implement a standard bubble sort approach, where we will only sort the pointers to the strings, but not the strings itself.
For getting the uniques of words, we use the sorted array.
In a loop, we will check, if a current word is the same as the next word in the array. If it is equal, then we skip, count the duplicate and look for the next word. If not equal, then we store the last word in the new result array. This will then be unique.
We will also store the counter.
So. There are manypossible solutions. Please see below one of them. It is unbelievable 350 lines of code and has not so much to do withg C . Anyway:
#include <iostream>
#include <fstream>
// Some abbreviations
typedef char* String;
typedef String *StringArray;
typedef unsigned int *CounterArray;
// Convert a character to a lower case character
char lowerCase(char c) {
if (c >= 'A' and c <= 'Z')
c = ('a' - 'A');
return c;
}
// Check if the character is considered to be a part of a word.
// If you want also numbers and underscores, then uncomment the commented part
bool isWordCharacter(char c) {
return (c >= 'A' and c <= 'Z') or ((c >= 'a' and c <= 'z')) /*or (c >= '0' and c <= '9') or (c == '_')*/;
}
// Simple comparison of Strings, like C-library function
int stringCompare(String s1, String s2)
{
// Check, as long as it is equal
while ((*s1 != '\0' and *s2 != '\0') and *s1 == *s2) {
s1 ; s2 ;
}
// compare the mismatching character and return the result
return (*s1 == *s2) ? 0 : (*s1 > *s2) ? 1 : -1;
}
// Get the length of a string. Like C-library function
unsigned int stringLength(String string) {
unsigned int result = 0;
while (*string ) result;
return result;
}
// Duplicate a string
String createAndCopy(String string) {
unsigned int length = stringLength(string) 1;
String newString = new char[length];
for (unsigned int k = 0; k < length; k)
newString[k] = string[k];
return newString;
}
// Get a filename from the user
String getFileName() {
// Initial estimated size of string
unsigned int stringSize = 32;
String fileName = new char[stringSize 1]{};
// Readuntil '\n'
char c;
unsigned int index = 0;
while (std::cin.get(c) and c!='\n') {
// Check, if we have enough space
if (index >= stringSize) {
// No, create char array with double the space than before
stringSize *= 2;
String temp = new char[stringSize 1] {};
// Copy all data from string to temp
for (unsigned int k = 0; k < index; k)
temp[k] = fileName[k];
// Delete old string
delete[] fileName;
// And reassign new one
fileName = temp;
}
// Store the character
fileName[index ] = c;
}
// And the terminating 0
fileName[index] = '\0';
// Try to open the file
std::ifstream ifs(fileName);
// Check, if it could be opened
if (not ifs) {
// Could not be opened. Show error message
std::cerr << "\n\n*** Error: could not open file: '" << fileName << "'\n\n";
// Delet alloocated memory
delete[] fileName;
// Indicate a bad result
fileName = nullptr;
}
return fileName;
}
// Rad all words from a stream to a dynamic array
unsigned int readWordsFromStreamToArray(std::ifstream* is, StringArray* stringArray) {
const int InitialStringArraySize = 16u;
const int InitialStringSize = 32u;
// Define array of strings with initial array size and allocate memory
unsigned int stringArraySize = InitialStringArraySize;
unsigned int indexInStringArray = 0;
*stringArray = new String[stringArraySize];
// We have alocal string for which we will later allocate memory
unsigned int stringSize = InitialStringSize;
unsigned int indexInString = 0;
String string{};
// We have 2 states. Either we wait for the beginning of a word or for the end of a word
bool waitForBeginOfWord{true};
// As long as we ar in the condition to read characters
bool readCharactersOK{ true };
while (readCharactersOK) {
// Read a character and check, if this was ok
char c; is->get(c);
readCharactersOK = (bool)(*is);
// We are in one of 2 states. Wait for begin of word or wait for end of word
if (waitForBeginOfWord) {
// As long, as we do not find the begin of a new word
if (readCharactersOK and isWordCharacter(c)) {
// Got state "wait for end of word"
waitForBeginOfWord = false;
// Now, we have a character from a word. Create a new string
string = new char[stringSize 1]{};
// And stat again with index 0
indexInString = 0;
}
}
// Are we in state wait for end of word?
if (not waitForBeginOfWord) {
// Do we have a vild character
if (readCharactersOK and isWordCharacter(c)) {
// Now, we have a letter that belongs to a word
// We want to add this now to our string, but need too check,if it is big enough
if (indexInString >= stringSize) {
// string is bigger than expected, allocate more memory. Double than before
stringSize *= 2;
String temp = new char[stringSize 1];
// Copy old string to new temp string
for (unsigned k = 0; k < indexInString; k)
temp[k] = string[k];
// Free the memory of the old string
delete[] string;
// And make the temp string to our current string
string = temp;
}
string[indexInString ] = lowerCase(c);
}
else {
// Now we are either at end of file or we have read a none-word character
// Now, a word is read. Terminate string with a 0
string[indexInString] = '\0';
// We want to add the word to the string array.
// First check, if there is still enough space
if (indexInStringArray >= stringArraySize) {
// We need more memory
stringArraySize *= 2;
// Create a bigger array
StringArray temp = new String[stringArraySize];
// Copy all strings from the old array to this temporaray array
for (unsigned int k = 0; k < indexInStringArray; k)
temp[k] = (*stringArray)[k];
// Delete old memory
delete[] (*stringArray);
// And assign newly created memory
*stringArray = temp;
}
// Store next word in array
(*stringArray)[indexInStringArray ] = string;
// Next time, we need to wait for the begin of a word again.
waitForBeginOfWord = true;
}
}
} // Return number of words
return indexInStringArray;
}
// Standard buuble sort algorithm. Only pointers will be exchanged
void bubbleSort(StringArray* stringArray, unsigned int numberOfWords) {
// Check whether we still need to sort
bool sorted = false;
// Abbreviation
StringArray ptr = *stringArray;
// As long as we need to sort
while (!sorted) // repeat until no more swaps
{
sorted = true; // Assume everything sorted
for (unsigned int j = 0; j < numberOfWords-1; j )
{
if (stringCompare(*(ptr j),*(ptr j 1))==1)
{
// Swap 2 pointers
String temp = *(ptr j);
*(ptr j) = *(ptr j 1);
*(ptr j 1) = temp;
// we swapped, so keep sorting
sorted = false;
}
}
}
}
// Get unique words from a sorted array of words
unsigned int makeUuniqueAndCount(StringArray* stringArray, unsigned int numberOfWords, StringArray* uniqueStringArray, CounterArray* counterArray) {
// first allocate memory for the resulting array
*uniqueStringArray = new String[numberOfWords]{};
*counterArray = new unsigned int[numberOfWords] {};
// Indices for the resulting arrays
unsigned int uniqueStringArrayIndex = 0;
// Here we count the frequency of the words. A words always exists at leastr once
unsigned int wordCounter = 1;
// For all words in the original array
for (unsigned int k=0; k < numberOfWords - 1; k) {
// List is sorted. So, 2 identical words would follow. Check this
if (stringCompare((*stringArray)[k], (*stringArray)[k 1]) != 0) {
// Differentword found. Duplicate search for this word is over
// Create a new string and copy old word to new array
String s = createAndCopy((*stringArray)[k]);
(*uniqueStringArray)[uniqueStringArrayIndex] = s;
// Store the word counter for this word
(*counterArray)[uniqueStringArrayIndex] = wordCounter;
// We start now to count from the beginning
wordCounter = 1;
uniqueStringArrayIndex;
}
else {
// Duplicate word found, increase word counter
wordCounter;
}
}
// And now, the original allocated array for the duplicate words are too big.
// Allocate real size and recopy.
if (uniqueStringArrayIndex != numberOfWords) {
// Get new, exact fitting temp array
StringArray temp1 = new String[uniqueStringArrayIndex];
// Copy all words into temp
for (unsigned int k = 0; k < uniqueStringArrayIndex; k)
temp1[k] = (*uniqueStringArray)[k];
// delete olf content
delete[](*uniqueStringArray);
// And reassign
(*uniqueStringArray) = temp1;
// Get new, exact fitting temp array
CounterArray temp2 = new unsigned int[uniqueStringArrayIndex];
// Copy all words into temp
for (unsigned int k = 0; k < uniqueStringArrayIndex; k)
temp2[k] = (*counterArray)[k];
// delete olf content
delete[](*counterArray);
// And reassign
(*counterArray) = temp2;
}
return uniqueStringArrayIndex;
}
int main() {
// Get a file name
String fileName = getFileName();
// If that worked and we got a valid file name
if (fileName) {
// Try to open the file
std::ifstream ifs(fileName);
// If that worked
if (ifs) {
// Define our arrays
StringArray stringArray{};
unsigned int numberOfWords = readWordsFromStreamToArray(&ifs, &stringArray);
// Show result
std::cout << "\n\nRaw word list:--------------------------------------------------\n";
for (unsigned int i = 0; i < numberOfWords; i) {
std::cout << i 1 << '\t' << stringArray[i] << '\n';
}
// Sort
bubbleSort(&stringArray, numberOfWords);
std::cout << "\n\nSorted word list:--------------------------------------------------\n";
// Sow result
for (unsigned int i = 0; i < numberOfWords; i) {
std::cout << i 1 << '\t' << stringArray[i] << '\n';
}
// Getting unique strings and count
StringArray uniqueStringArray{};
CounterArray counterArray{};
unsigned int numberOfUniqes = makeUuniqueAndCount(&stringArray, numberOfWords, &uniqueStringArray, &counterArray);
// Show result
std::cout << "\n\nUnique word list and count:--------------------------------------------------\n";
for (unsigned int i = 0; i < numberOfUniqes; i) {
std::cout << i 1 << '\t' << uniqueStringArray[i] << "\t --> " << counterArray[i] << '\n';
}
// Delete all dynamically allocated memory
for (unsigned int k = 0; k < numberOfWords; k) {
delete[] stringArray[k];
}
for (unsigned int k = 0; k < numberOfUniqes; k) {
delete[] uniqueStringArray[k];
}
delete[] stringArray;
delete[] uniqueStringArray;
delete[] counterArray;
}
delete[] fileName;
}
}
In C you would do it like the below:
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <iterator>
#include <map>
#include <regex>
#include <algorithm>
#include <cctype>
const std::regex re{ R"(\w )" };
int main() {
// Tell user what to do: Input a file name
std::cout << "Please eneter a filename:\n";
// Read the filename
if (std::string fileName{}; std::getline(std::cin, fileName)) {
// Open the file, and check, if it could be opened
if (std::ifstream inputFileStream{ fileName }; inputFileStream) {
// Read the complete file into a string
std::string data(std::istreambuf_iterator<char>(inputFileStream), {});
// Make everything lowe case
std::transform(data.begin(), data.end(), data.begin(), [](char c) {return (char)std::tolower(c); });
// Get all words from the string
std::vector<std::string> words(std::sregex_token_iterator(data.begin(), data.end(), re), {});
// Define a counter for the words
std::map<std::string, size_t> counter{};
// Show them and count them
std::cout << "\n\n\nWord list:\n\n";
for (const std::string& word : words) {
std::cout << word << '\n';
counter[word] ;
}
// Show sorted unique list with counts
std::cout << "\n\n\nUniqe counted word list:\n\n";
for (const auto& [word, count] : counter)
std::cout << word << "\t --> " << count << '\n';
}
else std::cerr << "\n\n*** Error. Could not open file '" << fileName << "'\n\n";
}
}