Home > OS >  In C, I need some help fixing my code to read the frequency of each character in a file, and display
In C, I need some help fixing my code to read the frequency of each character in a file, and display

Time:04-25

So I have written a code up in C to print the frequency of every character in a file called "harrypotter1.txt" (the whole first Harry Potter book). It works to the extent that there are random blank spaces printed with " : 0" next to it when it Should only print the character in the file. Below I will list my code, and show the output it prints on to the screen, If someone can help me fix the problem. NOTE: I need to use the struct!

    #include <stdio.h>
    #include <stdlib.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    
    struct pair //struct to store frequency and value 
    {
        int frequency;
        char value;
    };
    
    int main()
    {
        struct pair table[128]; //set to 128 because these are the main characters
    
        int fd; // file descriptor for opening file
        char buffer[1]; // buffer for reading through files bytes
    
        fd = open("harrypotter1.txt", O_RDONLY); // open a file in read mode
        
        for(int j = 0; j < 128; j  )//for loop to initialize the array of pair (struct)
        {
            table[j].value = j; // table with index j sets the struct char value to equal the index
            table[j].frequency = 0; // then the table will initialize the frequency to be 0
        }
    
        while((read(fd, buffer, 1)) > 0) // read each character and count frequency
        {
              int k = buffer[0]; //index k is equal to buffer[0] with integer mask becasue each letter has a ASCII number.
              table[k].frequency  ; //using the struct pair table with index k to count the frequency of each character in text file
        }
    
        close(fd); // close the file
    
        for (int i = 0; i < 128; i  ) // use for loop to print frequency of characters
        {       
                printf("%c: %d\n",table[i].value, table[i].frequency); // print characters and its frequency
        }
        
        return 0; //end of code
    }

Output:

    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
            : 3
    
    : 10702
    
    : 0
    
    : 0
    : 10702
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
    : 0
     0
    : 0
    : 0
    : 0
    : 0
     : 70803
    !: 474
    ": 4758
    #: 0
    $: 0
    %: 0
    &: 0
    ': 3141
    (: 30
    ): 33
    *: 2
     : 0
    ,: 5658
    -: 1990
    .: 6136
    /: 0
    0: 5
    1: 11
    2: 3
    3: 8
    4: 6
    5: 2
    6: 1
    7: 4
    8: 1
    9: 4
    :: 69
    ;: 135
    <: 0
    =: 0
    >: 0
    ?: 754
    @: 0
    A: 703
    B: 348
    C: 293
    D: 685
    E: 287
    F: 426
    G: 492
    H: 2996
    I: 1393
    J: 51
    K: 79
    L: 209
    M: 665
    N: 488
    O: 332
    P: 639
    Q: 203
    R: 660
    S: 844
    T: 1055
    U: 193
    V: 192
    W: 653
    X: 2
    Y: 326
    Z: 5
    [: 0
    \: 1
    ]: 0
    ^: 0
    _: 0
    `: 0
    a: 25887
    b: 4980
    c: 6403
    d: 15932
    e: 39628
    f: 6431
    g: 8127
    h: 19535
    i: 19422
    j: 319
    k: 3930
    l: 14385
    m: 6729
    n: 21337
    o: 25809
    p: 4909
    q: 217
    r: 20990
    s: 18870
    t: 27993
    u: 9562
    v: 2716
    w: 7744
    x: 381
    y: 8293
    z: 259
    {: 0
    |: 0
    }: 0
    ~: 1
    : 0

*/

CodePudding user response:

You can use an array instead of the struct as the value is simply the array index: table[j].value = j;. Initialized the array instead of assigning the initial values in a loop. Added error checking for open. Use isprint() to figure out if we should print a given character:

#include <ctype.h>
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

// assumes 2^i to make the bitwise & work below
#define LEN 128

int main() {
    unsigned frequency[LEN] = { 0 };

    int fd = open("harrypotter1.txt", O_RDONLY);
    if(fd == -1) {
        printf("%s\n", strerror(errno));
        return 1;
    }
    unsigned char buffer;
    while((read(fd, &buffer, 1)) > 0) {
        frequency[buffer & (LEN-1)]  ;
    }
    close(fd);

    for (int i = 0; i < LEN; i  ) {
        if(isprint(i))
            printf("%c: %u\n", i, frequency[i]);
    }
    return 0;
}

and using the placing the content "hello world" in the input file, I get the following output:

 : 1
!: 0
": 0
#: 0
$: 0
%: 0
&: 0
': 0
(: 0
): 0
*: 0
 : 0
,: 0
-: 0
.: 0
/: 0
0: 0
1: 0
2: 0
3: 0
4: 0
5: 0
6: 0
7: 0
8: 0
9: 0
:: 0
;: 0
<: 0
=: 0
>: 0
?: 0
@: 0
A: 0
B: 0
C: 0
D: 0
E: 0
F: 0
G: 0
H: 0
I: 0
J: 0
K: 0
L: 0
M: 0
N: 0
O: 0
P: 0
Q: 0
R: 0
S: 0
T: 0
U: 0
V: 0
W: 0
X: 0
Y: 0
Z: 0
[: 0
\: 0
]: 0
^: 0
_: 0
`: 0
a: 0
b: 0
c: 0
d: 1
e: 1
f: 0
g: 0
h: 1
i: 0
j: 0
k: 0
l: 3
m: 0
n: 0
o: 2
p: 0
q: 0
r: 1
s: 0
t: 0
u: 0
v: 0
w: 1
x: 0
y: 0
z: 0
{: 0
|: 0
}: 0
~: 0

CodePudding user response:

The C answer by Allan Wind is good insofar as it produces the correct results, but it does allocate a larger array of characters than the minimum needed to solve the problem. This waste of space is a compromise forced by the fact that C array indices must start at 0 and the first printable character ' ' has a value of 32 and the last printable character '~' has a value of 126.

with Ada.Text_IO; use Ada.Text_IO;

procedure count_graphic_characters is
   subtype graphix is Character range ' ' .. '~';
   counts   : array (graphix) of Natural := (Others => 0);
   The_file : File_Type;
   C        : Character;
begin
   Open
     (File => The_file, Mode => In_File,
      Name => "src\count_graphic_characters.adb");
   while not End_Of_File (The_file) loop
      Get (File => The_file, Item => C);
      counts (C) := counts (C)   1;
   end loop;
   Close (The_file);
   for I in counts'Range loop
      Put_Line (I & ": " & counts (I)'Image);
   end loop;
end count_graphic_characters;

This program counts the frequency of characters in its own source file using the Ada programming language.

The subtype graphix is defined to contain all the graphic characters starting at ' ' and ending at '~'. The array name counts is indexed by the characters in the subtype graphix. Each element of the array is an instance of the pre-defined subtype Natural, and is initialized to 0. The array contains exactly enough elements to count every graphic character in the source file.

The program will raise an exception if the file named in the Open procedure cannot be found.

As each character is read from the file that character is used as an index into the counts array and the corresponding element is incremented.

No space is wasted by creating a 128 element array. Instead an array of 95 characters is used. There is also no need to check each array element to determine if the character represented by the index is a printable character since the array index values are only the printable characters.

The output of this program is:

 :  132
!:  0
":  4
#:  0
$:  0
%:  0
&:  2
':  6
(:  10
):  10
*:  0
 :  1
,:  3
-:  0
.:  5
/:  0
0:  1
1:  1
2:  0
3:  0
4:  0
5:  0
6:  0
7:  0
8:  0
9:  0
::  6
;:  14
<:  0
=:  8
>:  6
?:  0
@:  0
A:  2
B:  0
C:  7
D:  0
E:  1
F:  5
G:  1
H:  0
I:  8
J:  0
K:  0
L:  1
M:  1
N:  2
O:  5
P:  1
Q:  0
R:  1
S:  0
T:  8
U:  0
V:  0
W:  0
X:  0
Y:  0
Z:  0
[:  0
\:  1
]:  0
^:  0
_:  18
`:  0
a:  26
b:  3
c:  21
d:  9
e:  43
f:  8
g:  9
h:  18
i:  22
j:  0
k:  0
l:  17
m:  3
n:  20
o:  22
p:  13
q:  0
r:  24
s:  15
t:  23
u:  13
v:  0
w:  2
x:  4
y:  3
z:  0
{:  0
|:  0
}:  0
~:  1

CodePudding user response:

The first 32 ASCII characters (values 0 - 31) are "non-printable" in the sense that they represent characters with special meaning or behavior. You can have your code as it is, but limit the actual print-out to only include "printable" characters. You could start at space ' ' (value 32) and end at 'z' (122) which will give most of the printables (though not just letters).

for (int i = ' '; i <= 'z'; i  ) // use for loop to print frequency of characters
{       
    printf("%c: %d\n",table[i].value, table[i].frequency); // print characters and its frequency
}

From your printout one can see 10702 CR (value 13) and 10702 LF (value 10) revealing that the text file has 10702 newlines and is a Windows text file.

  •  Tags:  
  • c
  • Related