I'm trying to read a file containing a paragraph, count the number of times specific words occur (words that I have specified and stored in an array) and then print that result to another file that would look something like,
systems, 2
computer, 3
programming, 6
and so on. Currently, all this code does is spit out every word in the paragraph and their respective counts. Any help would be much appreciated.
#include <stdio.h>
#include <string.h>
int main()
{
FILE* in;
FILE* out;
char arr1[13][100] = { "systems", "programming", "computer", "applications", "language", "machine"};
int arr2[180] = {0};
int count = 0;
char temp[150];
in = fopen("out2.dat", "r");
out = fopen("out3.dat", "w");
while (fscanf(in, "%s", temp) != EOF)
{
int i, check = 8;
for (i = 0;i < count;i )
{
if (strcmp(temp, arr1[i]) == 0)
{
arr2[i] ;
check = 1;
break;
}
}
if (check == 1) continue;
strcpy(arr1[count], temp);
arr2[count ] = 1;
}
int i;
for (i = 0; i < count; i )
fprintf(out, "%s, %d\n", arr1[i], arr2[i]);
return 0;
}
CodePudding user response:
The use of count
does not make much sense throughout this program.
It is declared as int count = 0;
, and then used as the upper bound in this loop
for (i = 0; i < count; i )
limiting which search words will be used. This also means that this loop will not be entered on the first iteration of the surrounding while
loop.
As such, check != 1
, so after this count
is used as the index in arr1
at which the currently read "word" will be copied into
strcpy(arr1[count], temp);
which makes absolutely no sense. Why overwrite data you are searching for?
Then count
is incremented to 1
after being used to set the first element of arr2
to 1
.
On the second iteration of the while
loop, the for
loop will run for exactly one iteration, comparing the newly read "word" (temp
) against the first element of arr1
(which is now the last "word" read).
If this matches: the first element in arr2
is incremented from 1
to 2
, the string copy is skipped, and count
is not incremented.
If this does not match, the new "word" is copied into the second element of arr1
, the second element of arr2
is set to 1
, and count
is incremented to 2
.
This spirals out of control from here.
Given the input shown above, this accesses arr1
out-of-bounds when count
reaches 13
.
With files that have a small selection of data (<= 13 unique "words", lengths < 100), this may accidentally "work" by populating arr1
with the words from the file. This will have the end effect of showing you the counts of each "word" in the input file.
Eventually, you will invoke Undefined Behavior when one of the following occurs:
fscanf(in, "%s", temp)
reads a string that overflows thetemp
buffer.count
exceeds the bounds ofarr1
orarr2
.strcpy(arr1[count], temp);
copies a string that overflows a buffer inarr1
.- Either
fopen
fail.
In addition to being unsafe, fscanf(in, "%s", temp)
will consider anything other than whitespace as being part of a valid string. This includes trailing punctuation, which may or may not be an issue depending on which tokens you want to match (systems.
vs. systems
). You may need more robust parsing.
In any case, either create an array of structures composed of search words and frequencies, or, create two arrays of the same length to represent this data:
const char *words[6] = { "systems", "programming", "computer", "applications", "language", "machine"};
unsigned freq[6] = { 0 };
There is no need to copy anything. Remember to check if fopen
fails, and to limit %s
when reading as not to overflow the input buffer.
The rest of the program looks similar: test each input "word" against all search words; increment the corresponding frequency if a match.
An example using an array of structures:
#include <stdio.h>
#include <string.h>
int main(void) {
struct {
const char *word;
unsigned freq;
} search_words[] = {
{ "systems", 0 },
{ "programming", 0 },
{ "computer", 0 },
{ "applications", 0 },
{ "language", 0 },
{ "machine", 0 }
};
size_t length = sizeof search_words / sizeof *search_words;
FILE *input_file = fopen("out2.dat", "r");
FILE *output_file = fopen("out3.dat", "w");
if (!input_file || !output_file) {
fclose(input_file);
fclose(output_file);
fprintf(stderr, "Could not access files.\n");
return 1;
}
char word[256];
while (1 == fscanf(input_file, "%5s", word))
for (size_t i = 0; i < length; i )
if (0 == strcmp(word, search_words[i].word))
search_words[i].freq ;
fclose(input_file);
for (size_t i = 0; i < length; i )
fprintf(output_file, "%s, %u\n",
search_words[i].word,
search_words[i].freq);
fclose(output_file);
}
cat out3.dat
:
systems, 1
programming, 1
computer, 2
applications, 2
language, 1
machine, 1