I'm trying to read formatted data with sscanf in this format:
"%d,%d,%d"
for example,
sscanf(buffer, "%d,%d,%d", n1, n2, n3)
and it is critic that there is no white space between every number, because the input comes like this, for example:
109,304,249
194,204,482
etc.
But when I give sscanf an input like:
13, 56, 89
145, 646, 75
it still reads it even though it is not in the specifed format, and should be invalid
Is there is a way to write something such that sscanf will work as I expect it to work?
CodePudding user response:
scanf
ignores leading whitespace for most conversions, but what you want is the opposite of ignoring whitespace.
You cannot tell scanf
to error out on a whitespace, but you can detect how many whitespace characters were consumed. For example:
#include <stdio.h>
int main()
{
int first, second, before, after;
int nread;
nread = scanf("%d,%n %n%d", &first, &before, &after, &second);
if (nread != 2) {
printf ("Error in the input stream, 2 items expected, %d matched\n", nread);
}
else if (before != after) {
printf ("Error in the input stream, %d whitespace characters detected\n",
after-before);
}
else {
printf ("Got numbers %d %d\n", first, second);
}
}
This detects whitespace before the second input.
Having said that, erroring out on whitespace is probably not a good idea.
CodePudding user response:
The %d
conversion format ignores leading white space as well as an optional
sign. If you want a stricter validation, you can:
- use an extra validation phase before or after converting the values.
- use a hand coded conversion that rejects malformed input.
Here are some propositions:
#include <stdio.h>
#include <string.h>
int main() {
char input[100];
char verif[100];
int a, b, c;
if (fgets(input, sizeof input, stdin)) {
if (sscanf(input, "%d,%d,%d", &a, &b, &c) == 3) {
snprintf(verif, sizeof verif, "%d,%d,%d\n", a, b, c);
if (strcmp(input, verif) == 0) {
printf("input is valid, a=%d, b=%d, c=%d\n", a, b, c);
return 0;
} else {
printf("input contains extra characters\n");
return 1;
}
} else {
printf("input does not match the pattern\n");
return 1;
}
} else {
printf("no input\n");
return 1;
}
}
You could just check if the input string contains any white space:
if (input[strcspn(input, " \t")] != '\0') {
printf("input string contains spaces or TABs\n");
}
But negative values, redundant
signs and leading zeroes will pass the test (eg: "-1, 1, 0001"
).
If all values must be positive, you could use scanf()
to perform a poor man's pattern matching:
if (fgets(input, sizeof input, stdin)) {
char c1, c2;
if (sscanf(input, "%*[0-9],%*[0-9],%*[0-9]%c%c", &c1, &c2) == 1 && c1 == '\n') {
/* input line contains exactly 3 numbers separated by a `,` */
} else {
printf("invalid format\n");
}
}
Note however these remarks:
- redundant leading zeroes would still be accepted by this validation phase (eg:
"00,01,02"
). - overlong numbers will technically cause undefined behavior in all of the above methods (eg:
"0,0,999999999999999999999999999999"
), But the first approach will detect this problem if the conversion just truncates or maximises the converted value. - an overlong input line (longer than 98 bytes plus a newline) will be truncated by
fgets()
, leaving the rest of the input for the next read operation. This may be a problem too.
The solution to your problem depends on how strict and concise the validation must be.
CodePudding user response:
One idea to validate the input is to use the regular expression to match three integers delimited by commas without including other characters.
#include <stdio.h>
#include <regex.h>
#include <string.h>
int main()
{
regex_t preg; // regex pattern
int ret;
int n1, n2, n3;
char *p; // pointer to a character th the string
char buffer[BUFSIZ]; // input buffer
// compile regex to match three integers delimited by commas
ret = regcomp(&preg, "^([ -]?[[:digit:]] ,){2}[ -]?[[:digit:]] $", REG_EXTENDED);
if (ret) {
fprintf(stderr, "Compile error of regex\n");
exit(1);
}
while (fgets(buffer, BUFSIZ, stdin)) {
if ((p = rindex(buffer, '\n'))) *p = 0; // remove trailing newline
ret = regexec(&preg, buffer, 0, NULL, 0);
if (!ret) { // input matches the regex pattern
sscanf(buffer, "%d,%d,%d", &n1, &n2, &n3);
printf("=> %d,%d,%d\n", n1, n2, n3);
} else {
printf("Error in the input: %s\n", buffer);
}
}
return 0;
}
where ^([ -]?[[:digit:]] ,){2}[ -]?[[:digit:]] $
is the regex:
^
anchors the start of the input string.[ -]?
matches zero or one leading plus or minus sign.[[:digit:]] ,
matches the sequence of digits followed by a comma.- The parentheses
( pattern )
makes a group of the pattern. - The quantifier
{2}
specifies the number of repetition of the previous atom (including group). $
anchors the end of the input string.