Home > Net >  Open files and work faster with the words (In C)
Open files and work faster with the words (In C)

Time:07-03

Hi I am working on a function that receives a file (.txt) pointer, a first struct (could be a list or a hash's table) and a second struct (could be list or a hash's table) and a function which receive 2 structures and store the file words on one of them. I made a word_struct function because I want to have got only function to open a dictionary and store the words in a hash table, and open other text file, and save only the words which isn't in the dictionary in a list. So just I need to change the strct_f function to open each file type. But should be there a faster way to do that, because it takes a few seconds to work (maybe because the text file with the dictionary has got 700.000 words). It is important to say, that this method avoids storing a big array for the dictionary, and just work with each word. So I haven`t spend a lot of memory at the same time.

Is it a complex and dirty code? and is there other way to do this? Thanks you so much!

void *word_struct(FILE *fptr, void *strct, void *strct2, strct_f strct_f) {
    int empty_file = 0, i = 0;
    char a;            
    char *temp = malloc(sizeof(char) * BUFFER);
    while (empty_file != 1) {
        if ((a = fgetc(fptr)) == EOF) {
            empty_file = 1;
            if (i != 0) {
                temp[i] = '\0';
                strct_f(strct, strct2, temp);
            }
            free(temp);
        } else
        if ((a >= 97 && a <= 122) || (a >= 65 && a <= 90)) {
            temp[i] = (char)tolower(a);
            i  = 1;
        } else {
            if (i != 0) {
                temp[i] = '\0';
                strct_f(strct, strct2, temp);
                free(temp);
                temp = malloc(sizeof(char) * BUFFER);
                i = 0;  
            }
        }
    }
    return strct;
}

CodePudding user response:

There are some issues in your code:

  • using opaque void * arguments and return value is unsafe: this defeats the compiler type checking features.
  • strct_f strct_f is ugly: you shadow the name of the type with the variable name. Use strct_f fun instead.
  • a must be defined as an int to test EOF properly and call tolower() with an argument in the proper range. It is more idiomatic to use the name c for this variable.
  • you always allocate BUFFER bytes for all words. This is wasteful and may cause the program to run slower
  • you do not test for buffer overflow when storing the next character into the temp array, invoking undefined behavior on bogus dictionaries.
  • the last temp buffer is not freed, causing a memory leak.
  • hard coding ASCII values for the letters is non portable, hard to read and less efficient than using isalpha(c)
  • the words in the dictionary might legitimately contain dashes and some other non alpha bytes.

Here is a modified version:

#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void *word_struct(FILE *fp, void *p1, void *p2, strct_f fun) {
    char buf[BUFFER];
    size_t i = 0;

    for (;;) {
        int c = getc(fp);
        if (isalpha(c)) {
            if (i   1 < sizeof(buf))
                buf[i  ] = (char)tolower(c);
            }
        } else {
            if (i > 0) {
                buf[i] = '\0';
                fun(p1, p2, strdup(buf));
                i = 0;  
            }
            if (c == EOF)
                break;
        }
    }
    return p1;
}
  • Related