Home > front end >  How do I make sscanf ignore white spaces? C
How do I make sscanf ignore white spaces? C

Time:03-28

I'm trying to read formatted data with sscanf in this format:

"%d,%d,%d"

for example,

sscanf(buffer, "%d,%d,%d", n1, n2, n3)

and it is critic that there is no white space between every number, because the input comes like this, for example:

109,304,249
194,204,482

etc.

But when I give sscanf an input like:

13, 56, 89
145,    646,    75

it still reads it even though it is not in the specifed format, and should be invalid

Is there is a way to write something such that sscanf will work as I expect it to work?

CodePudding user response:

scanf ignores leading whitespace for most conversions, but what you want is the opposite of ignoring whitespace.

You cannot tell scanf to error out on a whitespace, but you can detect how many whitespace characters were consumed. For example:

  #include <stdio.h>

  int main()
  {
      int first, second, before, after;
      int nread;

      nread = scanf("%d,%n %n%d", &first, &before, &after, &second);
      if (nread != 2) {
          printf ("Error in the input stream, 2 items expected, %d matched\n", nread);
      }
      else if (before != after) {
          printf ("Error in the input stream, %d whitespace characters detected\n", 
                      after-before);
      }
      else {
          printf ("Got numbers %d %d\n", first, second);
      }
  }

This detects whitespace before the second input.

Having said that, erroring out on whitespace is probably not a good idea.

CodePudding user response:

The %d conversion format ignores leading white space as well as an optional sign. If you want a stricter validation, you can:

  • use an extra validation phase before or after converting the values.
  • use a hand coded conversion that rejects malformed input.

Here are some propositions:

#include <stdio.h>
#include <string.h>

int main() {
    char input[100];
    char verif[100];
    int a, b, c;

    if (fgets(input, sizeof input, stdin)) {
        if (sscanf(input, "%d,%d,%d", &a, &b, &c) == 3) {
            snprintf(verif, sizeof verif, "%d,%d,%d\n", a, b, c);
            if (strcmp(input, verif) == 0) {
                printf("input is valid, a=%d, b=%d, c=%d\n", a, b, c);
                return 0;
            } else {
                printf("input contains extra characters\n");
                return 1;
            }
        } else {
            printf("input does not match the pattern\n");
            return 1;
        }
    } else {
       printf("no input\n");
       return 1;
    }
}

You could just check if the input string contains any white space:

    if (input[strcspn(input, " \t")] != '\0') {
        printf("input string contains spaces or TABs\n");
    }

But negative values, redundant signs and leading zeroes will pass the test (eg: "-1, 1, 0001").

If all values must be positive, you could use scanf() to perform a poor man's pattern matching:

    if (fgets(input, sizeof input, stdin)) {
        char c1, c2;
        if (sscanf(input, "%*[0-9],%*[0-9],%*[0-9]%c%c", &c1, &c2) == 1 && c1 == '\n') {
            /* input line contains exactly 3 numbers separated by a `,` */
        } else {
            printf("invalid format\n");
        }
    }

Note however these remarks:

  • redundant leading zeroes would still be accepted by this validation phase (eg: "00,01,02").
  • overlong numbers will technically cause undefined behavior in all of the above methods (eg: "0,0,999999999999999999999999999999"), But the first approach will detect this problem if the conversion just truncates or maximises the converted value.
  • an overlong input line (longer than 98 bytes plus a newline) will be truncated by fgets(), leaving the rest of the input for the next read operation. This may be a problem too.

The solution to your problem depends on how strict and concise the validation must be.

CodePudding user response:

One idea to validate the input is to use the regular expression to match three integers delimited by commas without including other characters.

#include <stdio.h>
#include <regex.h>
#include <string.h>

int main()
{
    regex_t preg;                               // regex pattern
    int ret;
    int n1, n2, n3;
    char *p;                                    // pointer to a character th the string
    char buffer[BUFSIZ];                        // input buffer

    // compile regex to match three integers delimited by commas
    ret = regcomp(&preg, "^([ -]?[[:digit:]] ,){2}[ -]?[[:digit:]] $", REG_EXTENDED);
    if (ret) {
        fprintf(stderr, "Compile error of regex\n");
        exit(1);
    }

    while (fgets(buffer, BUFSIZ, stdin)) {
        if ((p = rindex(buffer, '\n'))) *p = 0; // remove trailing newline
        ret = regexec(&preg, buffer, 0, NULL, 0);
        if (!ret) {                             // input matches the regex pattern
            sscanf(buffer, "%d,%d,%d", &n1, &n2, &n3);
            printf("=> %d,%d,%d\n", n1, n2, n3);
        } else {
            printf("Error in the input: %s\n", buffer);
        }
    }

    return 0;
}

where ^([ -]?[[:digit:]] ,){2}[ -]?[[:digit:]] $ is the regex:

  • ^ anchors the start of the input string.
  • [ -]? matches zero or one leading plus or minus sign.
  • [[:digit:]] , matches the sequence of digits followed by a comma.
  • The parentheses ( pattern ) makes a group of the pattern.
  • The quantifier {2} specifies the number of repetition of the previous atom (including group).
  • $ anchors the end of the input string.
  • Related