The following will use lorem.txt
as the test file:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
I have the following code meant to count lines, words, and characters in a file (trying to imitate the wc
in Linux):
#include <stdio.h>
int main(){
char data[500032]; // assigns 500KB of space for input string
if (fgets(data, sizeof data, stdin)) {
char *ptr = &data[0]; // initializes pointer at first character
int count = 0; // total character count
int d1_count = 0; // newline count
int d23_count = 0; // ' ' and '\t' count
while (*ptr){
char d1 = '\n';
char d2 = ' ';
char d3 = '\t';
count ; // counts character
if (*ptr == d1){
d1_count ; // counts newline
}
if (*ptr == d2 || *ptr == d3) {
d23_count ; // counts spaces or tabs
}
ptr ; // increments pointer
}
printf("%d %d %d\n", d1_count, d23_count 1, count-1);
}
}
In my Linux terminal, I use gcc -o wordc wordc.c
to compile and then ./wordc < lorem.txt
However, I get 1 69 445
(1 line, 69 words, and 445 characters). This is the number of lines, words, and characters for the first paragraph only. I am expecting 7 lines, 207 words, and 1342 characters.
I assume what is happening is C stops reading the file once it finds a newline. How do I get it to stop doing this?
As an aside- I feel like assigning 500KB of space for a string is a bit hacky and wasteful. Are there any good ways to assign only as much space as I need?
Any help would be appreciated
CodePudding user response:
Change the line
if (fgets(data, sizeof data, stdin)) {
to
while (fgets(data, sizeof data, stdin)) {
so that you are reading one line per loop iteration.
You will also have to move the lines
int count = 0; // total character count
int d1_count = 0; // newline count
int d23_count = 0; // ' ' and '\t' count
outside the loop, because you want to remember these values between loop iterations.
You will also want to move the line
printf("%d %d %d\n", d1_count, d23_count 1, count-1);
outside the loop if you only want to print that line only once, instead of once per loop iteration.
I feel like assigning 500KB of space for a string is a bit hacky and wasteful. Are there any good ways to assign only as much space as I need?
The buffer must only be sufficiently large to store a single line. It does not have to store the entire file at once. Therefore, it would probably be sufficient to use a significantly smaller buffer.
Although it would be possible to use a dynamically allocated buffer (using malloc
) and resize the buffer as necessary (using realloc
), in this case, it is probably not necessary.
Since you stated in the question that you are using Linux, an alternative would be to use the POSIX-specfic function getline
, which handles most of the memory management for you.
I have rewritten your program to use getline
:
#include <stdio.h>
#include <stdlib.h>
int main() {
char *data = NULL;
size_t data_capacity = 0;
int count = 0; // total character count
int d1_count = 0; // newline count
int d23_count = 0; // ' ' and '\t' count
while ( getline( &data, &data_capacity, stdin ) >= 0 ) {
char *ptr = &data[0]; // initializes pointer at first character
while (*ptr){
char d1 = '\n';
char d2 = ' ';
char d3 = '\t';
count ; // counts character
if (*ptr == d1){
d1_count ; // counts newline
}
if (*ptr == d2 || *ptr == d3) {
d23_count ; // counts spaces or tabs
}
ptr ; // increments pointer
}
}
free( data );
printf("%d %d %d\n", d1_count, d23_count 1, count-1);
}
With the input specified in the question, this program has the following output:
5 205 1339