Home > OS >  Count letters from a File in C
Count letters from a File in C

Time:10-27

I am trying to create a program that reads from a file and counts the occurrence of each alphabetic character in the file. Below is what I have so far, however the counts returned (stored in counters array) are higher than expected.

  void count_letters(const char *filename, int counters[26]) {
    FILE* in_file = fopen(filename, "r");
    const char ALPHABET[] = "abcdefghijklmnopqrstuvwxyz";
    if(in_file == NULL){
        printf("Error(count_letters): Could not open file %s\n",filename);
        return;
    }
    char line[200];
    while(fgets(line, sizeof(line),in_file) != NULL){ //keep reading lines until there's nothing left to read
        for(int pos = 0; pos < sizeof(line); pos  ){//iterate through each character in line...
            if(isalpha(line[pos])){//skip checking and increment position if current char is not alphabetical
                for(int i = 0; i < 26; i  ){//...for each character in the alphabet
                    if(tolower(line[pos]) == tolower(ALPHABET[i]))//upper case and lower case are counted as same
                        counters[i]  ;    // increment the current element in counters for each match in the line
                }
            }
        }
    }
    fclose(in_file);
    return;
}

CodePudding user response:

In for(int pos = 0; pos < sizeof(line); pos ), sizeof(line) evaluates to the size of the entire array line, not the part that was filled in by the most recent fgets call. So, after long lines, the loop repeatedly counts characters left in the array from reading short lines.

Modify the loop to iterate only through the part of line most recently filled in by fgets. You can do that by exiting the loop when a null character is seen.

CodePudding user response:

My two cents for a somewhat simpler solution (you have got an awful lot of loops ;) ). Reading input line-by-line is preferred in most situations, but since you're simply counting characters here, I don't think this is one of them, and ultimately adds complication. This answer also assumes ASCII character encoding, which as noted in the comments of another answer is not guaranteed by the C standard. You can modify as needed with your char ALPHABET for ultimate portability

#include <stdio.h>
#include <ctype.h>

#define NUM_LETTERS 26

int main(void)
{
    FILE* in_file = fopen("/path/to/my/file.txt", "r");
    if (in_file == NULL) exit(-1);

    unsigned charCounts[NUM_LETTERS] = {0};
    int curChar;
    // rather than reading line-by-line, read one character at a time
    while ((curChar = fgetc(in_file)) != EOF)
    {
        // only proceed if it is a letter
        if (isalpha(curChar))
        {
            // this is bad if not using ASCII, instead you'd need another
            // loop to check your ALPHABET, but increment the count here
            (charCounts[tolower(curChar) - 'a'])  ;
        }
    }

    // print out the results
    for (int i=0; i<NUM_LETTERS; i  )
    {
        // 'A' i also assumes ASCII encoding
        printf("%c: %u\n", 'A' i, charCounts[i]);
    }
}

Demo using stdin instead of a file.

CodePudding user response:

You have an error in for(int pos = 0; pos < sizeof(line); .... You assume that all 200 positions in the array are valid characters, but this is true only for a text in which each line has 200 characters. You should count only the characters in the initialized part of the string. The length of it varies from line to line:

for(int pos = 0; pos < strlen(line); ...

Also, you do not need the most inner loop because all alphabetic characters very likely have sequential ASCII codes:

if(isalpha(line[pos]))
    counters[tolower(line[pos]) - 'a']  ;

I assume that counters have been previously initialized with 0s. If not, you must initialize this array before counting.

CodePudding user response:

you do not need to use fgets as character functions will work same fast as file system uses its own buffering.

#define NLETTERS    ('z' - 'a'   1)

int countLetters(FILE *fi, size_t *counter)
{
    int ch;
    if(fi && counter)
    {
        memset(counter, 0, sizeof(*counter * NLETTERS));
        while((ch = fgetc(fi)) != EOF)
        {
            if(isalpha(ch))
            {
                counter[tolower(ch) - 'a']  ;
            }
        }
        return 0;
    }
    return 1;
}
  • Related