Home > Back-end >  Reading a file and print lines that contain a certain stream of characters decided by the user
Reading a file and print lines that contain a certain stream of characters decided by the user

Time:08-31

Implement a program that receives the name of a text file and a format string via the command line format. A format string is a string that contains only three possible characters: 'V', which indicates a vowel, 'C', indicating a consonant and 'D' indicating a digit.

The program will then have to read the file and print a video the lines that respect the format string.

Note: both uppercase and lowercase characters will be accepted from the format string; assume that the formed string and the lines of the file contain at most 500 characters.

Example

Assuming that the format string is "VCD" and the file contains:

2og  
Ok1  
Ok  
oK2  

The program will print Ok1 and oK2 on the screen.

I've come up with this:

int main() {
  int x=1, i=0, flag=1, j=0;
  char str[500], n_file[100], formato[500],c;
  printf("\nInsert file name:\n");
  scanf("%s%*c",n_file);
  FILE *fr=fopen(n_file, "w");
  printf("\nTo quit from writing input 'quit'\n");
  while (x!=0) {
    fgets(str, sizeof str, stdin);
    if (strcmp(str, "quit\n")==0) {
      x=0;
    }else{
      fputs(str,fr);
    }
  }
  fclose(fr);
  printf("\nInsert the format string (V for vowels, C for consonants and D for digits):\n");
  scanf("%s%*c",formato);
  printf("%s",formato);
  FILE *fp=fopen(n_file, "r");
  while (!feof(fp)) {
    fgets(str, sizeof str, fp);
    while (str[i]!='\n' && str[i]!='\0' && flag!=0 && formato[j]!='\n') {
      printf("\n%c",str[i]);
      if (formato[j]=='V') {
        if (str[i]=='a' || str[i]=='A' || str[i]=='E' || str[i]=='e' || str[i]=='i' || str[i]=='I' || str[i]=='o' || str[i]=='O' || str[i]=='u' || str[i]=='U') {
          flag=1;
          i  ;
          j  ;
          printf("\nPrimo if");
        }else{
          flag=0;
        }
      }else if (formato[j]=='C'){
        if ((str[i]>='b'-32 && str[i]<='z'-32) || (str[i])>='b' && str[i]<='z') {
          flag=1;
          i  ;
          j  ;
          printf("\nSecondo if");
        }else{
          flag=0;
        }
      }else if (formato[j]=='D') {
        if (str[i]>=0 && str[i]<=9) {
          flag=1;
          i  ;
          j  ;
        } else {
          flag=0;
          printf("\nTerzo if");
        }
      }
    }
    if (flag==1) {
      printf("\n%s",str);
    }
    j=0;
    i=0;
      }
      return 0;
    }

The idea behind the code is: U compare each character in the format string first with vowels (I'm new to C so it's the only way i thought of doing the comparison, one by one for each vowel, capital or not), then I compare the consonants and then the digits.

If I have a 'V' in the format string but the line does not have a vowel in that position the flag gets set to 0 and thus result in the exit of the 2nd while loop and the str[] line gets replaced with the next file line.

I've put some printfs in the 2nd while and I don't know why the 3rd if sets the flag to 0 even if the format string is "VCD" and I insert a digit as last character.

CodePudding user response:

Alright, this is one of those where I can tell you have put effort and thought into what you are trying to do, you are wisely trying to track the "state" of whether a match or mismatch occurs, and you are using fgets() to read line by line -- but then the train falls completely off the track.

The fact you are supposed to read from the file containing the strings and write to the "screen" makes opening a file to write with "w" and then reading from stdin a bit bewildering (though you could redirect from your file on stdin -- but why?)

But most of all you need to look at Why is while ( !feof (file) ) always wrong? for while (!feof(fp)).

That said, how do you approach a problem like this? You take it step-by-step and think about tackling the following problems:

  • start by validating the number of arguments is what is needed,
  • validate the format string is made up of only valid characters 'V', 'C' and/or 'D' (in any case),
  • open your input file and validate it is open for reading,
  • loop reading each line from your input file, conditioning your read-loop on the return from your read-function, e.g. while (fgets (line, size, fp)) { ... },
  • check if the line matches the format string, if so, print it, if not, get the next line (you thought through this part correctly)

One other thought to help keep your logic straight, is to look at the steps and see if it would make sense to write a function to handle the task. This keeps your code tidy and prevents filling main() with an never ending jumble of code that is hard to follow.

For example, if you move the validation checks for if the user provided any bad characters in the format string to it's own function, and move the check whether each line matches the format to it's own function, your main() can be kept readable and easy to follow, e.g.

int main (int argc, char **argv) {

  char line[MAXC 1], *fmt, *fname;    /* buffer for line, names for args */
  FILE *fp;                           /* file pointer for file stream */
  
  if (argc != NARG   1) {   /* validate 2 arguments given (exe is always 1) */
    fprintf (stderr, "error: incorrect number of arguments, 2 required.\n"
                     "usage: %s str (max 3 chars 'V', 'C', 'D'), fname\n",
                     argv[0]);
    return 1;
  }
  
  fname = argv[1];    /* optional descriptive names for cmdline arguments */
  fmt = argv[2];
  
  if (bad_fmt_chars (fmt)) {  /* validate format string */
    return 1;
  }
  
  if (!(fp = fopen (fname, "r"))) { /* open/validate file open for reading */
    perror ("fopen-fname");
    return 1;
  }
  
  while (fgets (line, MAXC   1, fp)) {    /* loop reading each line */
    
    line[strcspn (line, "\n")] = 0;       /* trim '\n' from line */
    
    if (!bad_line_fmt (line, fmt)) {      /* check line format */
      puts (line);
    }
  }
}

Adding the header includes, constants and function declarations used above to the top of the code, you would have a complete program absent the two functions definitions you need to write, e.g.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define NARG        2   /* if you need a constant, #define one (or more) */
#define MAXC      500           /* max chars in line */
#define VALIDC   "VCD"          /* valid chars for str */
#define VOWELS   "AEIOUaeiou"   /* vowels */

/* function prototypes (declarations) */
int bad_fmt_chars (const char *str);
int bad_line_fmt (const char *line, const char *fmt);

int main (int argc, char **argv) {

  char line[MAXC 1], *fmt, *fname;    /* buffer for line, names for args */
  FILE *fp;                           /* file pointer for file stream */
  
  if (argc != NARG   1) {   /* validate 2 arguments given (exe is always 1) */
    fprintf (stderr, "error: incorrect number of arguments, 2 required.\n"
                     "usage: %s str (max 3 chars 'V', 'C', 'D'), fname\n",
                     argv[0]);
    return 1;
  }
  
  fname = argv[1];    /* optional descriptive names for cmdline arguments */
  fmt = argv[2];
  
  if (bad_fmt_chars (fmt)) {  /* validate format string */
    return 1;
  }
  
  if (!(fp = fopen (fname, "r"))) { /* open/validate file open for reading */
    perror ("fopen-fname");
    return 1;
  }
  
  while (fgets (line, MAXC   1, fp)) {    /* loop reading each line */
    
    line[strcspn (line, "\n")] = 0;       /* trim '\n' from line */
    
    if (!bad_line_fmt (line, fmt)) {      /* check line format */
      puts (line);
    }
  }
}

Now the two functions. How to check if the user entered any bad characters in the format string. While you can use a nested loop like you started to do, you can also use the strchr() function to replace the inner-loop since it can easily scan your string of allowable (valid) characters to see if the current character is found in it. (you can make it case insensitive by simply converting the current character to upper or lower case consistent with your definition of the valid characters) The use an required header for strchr() can be found in it man-page man 3 strchr1

When writing any function that can succeed or fail, or provides a informational return, choose an appropriate type for the function. For simple true/false returns (good/no-good, etc..), returning a simple int is fine. You can also include stdbool.h and return type bool in that case as well -- up to you.

Using a simple int return, returning 1 to indicate a bad character was found, or 0 if no offending characters were found -- and choosing a name that makes those returns make sense, you could do something like the following the validate the format string, e.g.

/* check if format string has bad char not one of "VCD",
 * returns 1 if bad char found, 0 otherwise
 */
int bad_fmt_chars (const char *str)
{
  for (int i = 0; str[i]; i  ) {      /* loop over chars in str */
    
    /* case insensitive char not in VALIDC, return err */
    if (!strchr (VALIDC, toupper(str[i]))) {
      fprintf (stderr, "error: invalid char '%c' is not one of \"VCD\".\n",
               str[i]);
      return 1;
    }
  }
  
  return 0;   /* return good (no bad chars) */
}

The function simply scans each character in the format string calling strchr() in a case insensitive way to check that each character is one of the VALIDC characters. It returns 1 immediately if an invalid format character is found, or 0 after scanning all characters.

Checking whether the line meets the format string can be handled in a number of different ways. One of the simplest is just to loop over each character read from the file and then use a switch() statement on the corresponding character from the format string to make sure the current character matches the format character. You can use the case-fallthrough aspect of a switch case: to do that in a case-insensitive way. (any case without a corresponding break falls through to the next case). So if you group both cases for 'V' and 'v' together without a break in the first, you will fall through to the second allowing you just to write the logic to handle a vowel once (same for consonant and digit).

What do you need to do for each character of input? Each character in line will have a corresponding format character in the format string. So for each character, you need to switch on the corresponding format character to determine if the current character is:

  • a vowel if the format character is 'v' or 'V',
  • a consonant if the format character is 'c' or 'C', or
  • a digit if the format character is 'd' or 'D'.

Putting that together and using a mismatch flag to indicate where a mismatch in format is found, you could do something similar to the following:

/* confirm each character in line matches format char in fmt,
 * returns 1 on any format mismatch or difference in length,
 * returns 0 if format matches exactly.
 */
int bad_line_fmt (const char *line, const char *fmt)
{
  int i = 0;      /* declare i to survive loop */
  
  /* both line and format chars must be available */
  for (; line[i] && fmt[i]; i  ) {
    int mismatch = 0;
    /* case fall-through intentional */
    switch (fmt[i]) {
      case 'v': /* if not in VOWELS break loop */
      case 'V': if (!strchr (VOWELS, line[i])) { mismatch = 1; } break;
      case 'c': /* if in VOWELS break loop */
      case 'C': if (strchr (VOWELS, line[i]) || 
                    isdigit (line[i]) ||
                    ispunct (line[i]) ||
                    isspace (line[i])) { 
                  mismatch = 1;
                }
                break;
      case 'd': /* if not digit break loop */
      case 'D': if (!isdigit (line[i])) { mismatch = 1; } break;
    }
    
    if (mismatch) {   /* if mismatch between format and char */
      return 1;       /* return bad format */
    }
  }
  
  if (line[i] || fmt[i]) {  /* if either line or fmt not at end */
    return 1;               /* return bad format */
  }
  
  return 0;     /* return line matches format - return good */
}

(note: thanks to @Fe2O3 for finding the cases to 'c' and 'C' that I overlooked -- providing the point the extra eyes are always a benefit)

That is the completed program. You also always compile with Full Warnings Enable which means the compiler will complain about the case fallthrough. The compiler has an option to eliminate that warning (you telling the compiler, yes, case fallthrough was intentional). For gcc that option is -Wno-implicit-fallthrough. That allows you to compile your program with no warnings. Never accept code until it compiles without warning. Setting -Werror tells the compiler to treat warnings as errors (an easy way to keep yourself honest)

Example Use/Output

Does it work? With your input in the file dat/formatvcd.txt, the program produces the following output:

$ ./bin/formatVCD dat/formatvcd.txt "VCD"
Ok1
oK2

What about the case-insensitive format string?

$ ./bin/formatVCD dat/formatvcd.txt "VcD"
Ok1
oK2

How about matching other formats?

$ ./bin/formatVCD dat/formatvcd.txt "DVC"
2og

or

$ ./bin/formatVCD dat/formatvcd.txt "VC"
Ok

What about the format string check?

$ ./bin/formatVCD dat/formatvcd.txt "VqD"
error: invalid char 'q' is not one of "VCD".

What about a short format string?

$ ./bin/formatVCD dat/formatvcd.txt "V"
(no output)

Always try and test all cases you can think of to fully test your code. Even doing that, there are usually creative folks here that can point out cases you haven't considered.

If you have trouble putting the code together, let me know and I'm happy to post a completed version. All you need to do is paste the two functions below main() in the complete version of main() above. If you are compiling with gcc, you can use:

$  gcc -Wall -Wextra -pedantic -Wshadow -Werror -Wno-implicit-fallthrough -std=c11 -Ofast -o formatVCD formatVCD.c

To compile the executable to formatVCD from the source formatVCD.c with full warnings enable and treating warnings as errors. All modern compilers will have equivalent option, just check their documentation.

footnotes:

  1. man pages are a bit cryptic when you first look at them, but they are very well written to give you the precise declaration, syntax, usage and any header or definitions required. Take a few minutes and make friends with the man-page format and then use them to lookup each function you use.

CodePudding user response:

@David Rankin has provided an OUTSTANDING answer to this OP. (Kudos Mr. Rankin!)

Because one should always strive for more, I'm posting my 'abbreviated' version of the algorithm. I didn't mess with files or command line arguments in order to focus on maximising the effectiveness of the code. Padding this out with validation (and perhaps some "data conditioning") would detract from its brevity.

For your consideration (and prepared for the downvotes.)

#include <stdio.h>
#include <string.h>
#include <ctype.h>

int main() {
    char *legit[] = {
        "",
        "BCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz",
        "0123456789",
        "AEIOUaeiou",
    };

    char *formato = "VCD";
    char *arr[] = { "2og","Ok1","Ok","oK2", };

    for( int i = 0; i < sizeof arr/sizeof arr[0]; i   ) {
        char *str = arr[ i ];
        char *wnt = formato;

        while( *wnt && *str && strchr( legit[ *wnt>>1&3 ], *str ) )
            wnt  , str  ;

        if ( !*str && !*wnt )
            puts( arr[ i ] );
    }

    return 0;
}

Output:

Ok1
oK2

With nothing better to do, here's another alternative that doesn't have the overhead of function calls for each character being checked. (This is a custom version of the facilities of some of the "ctype" functions.) This version relies on the "format specifier string" being all upper case.

#define V 'V'
#define C 'C'
#define D 'D'
#define B 1
char tbl[] = {
    B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, 
    B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, 
    B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, B, 
    D, D, D, D, D, D, D, D, D, D, B, B, B, B, B, B, 
    B, V, C, C, C, V, C, C, C, V, C, C, C, C, C, V, 
    C, C, C, C, C, V, C, C, C, C, C, B, B, B, B, B, 
    B, V, C, C, C, V, C, C, C, V, C, C, C, C, C, V, 
    C, C, C, C, C, V, C, C, C, C, C, B, B, B, B, B, 
};
#undef B
#undef D
#undef C
#undef V

int main() {
    char *formato = "VCD";
    char *arr[] = { "2og", "Ok1", "Ok", "oK2", };

    for( int i = 0; i < sizeof arr/sizeof arr[0]; i   ) {
        char *str = arr[ i ];
        char *wnt = formato;

        while( *wnt == tbl[ *str ] )
            wnt  , str  ;

        if ( !*str && !*wnt )
            puts( arr[ i ] );
    }
    return 0;
}
  •  Tags:  
  • c
  • Related