Hi I am working on a function that receives a file (.txt) pointer, a first struct (could be a list or a hash's table) and a second struct (could be list or a hash's table) and a function which receive 2 structures and store the file words on one of them. I made a word_struct
function because I want to have got only function to open a dictionary and store the words in a hash table, and open other text file, and save only the words which isn't in the dictionary in a list. So just I need to change the strct_f
function to open each file type. But should be there a faster way to do that, because it takes a few seconds to work (maybe because the text file with the dictionary has got 700.000 words).
It is important to say, that this method avoids storing a big array for the dictionary, and just work with each word. So I haven`t spend a lot of memory at the same time.
Is it a complex and dirty code? and is there other way to do this? Thanks you so much!
void *word_struct(FILE *fptr, void *strct, void *strct2, strct_f strct_f) {
int empty_file = 0, i = 0;
char a;
char *temp = malloc(sizeof(char) * BUFFER);
while (empty_file != 1) {
if ((a = fgetc(fptr)) == EOF) {
empty_file = 1;
if (i != 0) {
temp[i] = '\0';
strct_f(strct, strct2, temp);
}
free(temp);
} else
if ((a >= 97 && a <= 122) || (a >= 65 && a <= 90)) {
temp[i] = (char)tolower(a);
i = 1;
} else {
if (i != 0) {
temp[i] = '\0';
strct_f(strct, strct2, temp);
free(temp);
temp = malloc(sizeof(char) * BUFFER);
i = 0;
}
}
}
return strct;
}
CodePudding user response:
There are some issues in your code:
- using opaque
void *
arguments and return value is unsafe: this defeats the compiler type checking features. strct_f strct_f
is ugly: you shadow the name of the type with the variable name. Usestrct_f fun
instead.a
must be defined as anint
to testEOF
properly and calltolower()
with an argument in the proper range. It is more idiomatic to use the namec
for this variable.- you always allocate
BUFFER
bytes for all words. This is wasteful and may cause the program to run slower - you do not test for buffer overflow when storing the next character into the
temp
array, invoking undefined behavior on bogus dictionaries. - the last
temp
buffer is not freed, causing a memory leak. - hard coding ASCII values for the letters is non portable, hard to read and less efficient than using
isalpha(c)
- the words in the dictionary might legitimately contain dashes and some other non alpha bytes.
Here is a modified version:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void *word_struct(FILE *fp, void *p1, void *p2, strct_f fun) {
char buf[BUFFER];
size_t i = 0;
for (;;) {
int c = getc(fp);
if (isalpha(c)) {
if (i 1 < sizeof(buf))
buf[i ] = (char)tolower(c);
}
} else {
if (i > 0) {
buf[i] = '\0';
fun(p1, p2, strdup(buf));
i = 0;
}
if (c == EOF)
break;
}
}
return p1;
}