Home > Enterprise >  What is the best way to match a string to specified format?
What is the best way to match a string to specified format?

Time:03-26

The format that I want to match the string to is "from:<%s>" or "FROM:<%s>". The %s can be any length of characters representing an email address.

I have been using sscanf(input, "%*[fromFROM:<]%[@:-,.A-Za-z0-9]>", output). But it doesn't catch the case where the last ">" is missing. Is there a clean way to check if the input string is correctly formatted?

CodePudding user response:

You can't directly tell whether trailing literal characters in a format string are matched; there's no direct way for sscanf()) to report their absence. However, there are a couple of tricks that'll do the job:

Option 1:

int n = 0;
if (sscanf("%*[fromFROM:<]%[@:-,.A-Za-z0-9]>%n", email, &n) != 1)
    …error…
else if (n == 0)
    …missing >…

Option 2:

char c = '\0';
if (sscanf("%*[fromFROM:<]%[@:-,.A-Za-z0-9]%c", email, &c) != 2)
    …error — malformed prefix or > missing…
else if (c != '>')
    …error — something other than > after email address…

Note that the 'from' scan-set will match ROFF or MorfROM or <FROM:morf as a prefix to the email address. That's probably too generous. Indeed, it would match: from:<foofoomoo of from:<[email protected]>, which is a much more serious problem, especially as you throw the whole of the matched material away. You should probably capture the value and be more specific:

char c = '\0';
char from[5];
if (sscanf("%4[fromFROM]:<%[@:-,.A-Za-z0-9]%[>]", from, email, &c) != 3)
    …error…
else if (strcasecmp(from, "FROM") != 0)
    …not from…
else if (c != '>')
    …missing >…

or you can compare using strcmp() with from and FROM if that's what you want. The options here are legion. Be aware that strcasecmp() is a POSIX-specific function; Microsoft provides the equivalent stricmp().

CodePudding user response:

Regarding the first part of the string, if you want to accept only FROM:< or from:< , then you can simply use the function strncmp with both possibilities. Note, however, that this means that for example From:< will not be accepted. In your question, you implied that this is how you want your program to behave, but I'm not sure if this really is the case.

Generally, I wouldn't recommend using the function sscanf for such a complex task, because that function is not very flexible. Also, in ISO C, it is not guaranteed that character ranges are supported when using the %[] format specifier. Therefore, I would recommend checking the individual parts of the string "manually":

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>

bool is_valid_string( const char *line )
{
    const char *p;

    //verify that string starts with "from:<" or "FROM:<"
    if (
        strncmp( line, "from:<", 6 ) != 0
        &&
        strncmp( line, "FROM:<", 6 ) != 0
    )
    {
        return false;
    }

    //verify that there are no invalid characters before the `>`
    for ( p = line   6; *p != '>'; p   )
    {
        if ( *p == '\0' )
            return false;

        if ( isalpha( (unsigned char)*p ) )
            continue;

        if ( isdigit( (unsigned char)*p ) )
            continue;

        if ( strchr( "@:-,.", *p) != NULL )
            continue;

        return false;
    }

    //jump past the '>' character
    p  ;

    //verify that we are now at the end of the string
    if ( *p != '\0' )
        return false;

    return true;
}

int main( void )
{
    char line[200];

    //read one line of input
    if ( fgets( line, sizeof line, stdin ) == NULL )
    {
        printf( "Input failure!\n" );
        exit( EXIT_FAILURE );
    }

    //remove newline character
    line[strcspn(line,"\n")] = '\0';

    //call function and print result
    if ( is_valid_string ( line ) )
        printf( "VALID\n" );
    else
        printf( "INVALID\n" );
}

This program has the following output:

This is an invalid string.
INVALID
from:<[email protected]
INVALID
from:<[email protected]>
VALID
FROM:<[email protected]
INVALID
FROM:<[email protected]>
VALID
FROM:<john.doe@example!!!!.com>            
INVALID
FROM:<[email protected]>invalid
INVALID

CodePudding user response:

Use "%n". It records the offset of the scan of input[], if scanning got that far.

Use it to:

  • Detect scan success that include the >.

  • Detect Extra junk.

A check of the return value of sscanf() is not needed.

Also use a width limit.

char output[100];
int n = 0;
// sscanf(input, "%*[fromFROM:<]%[@:-,.A-Za-z0-9]>", output);
sscanf(input, "%*[fromFROM]:<           
  • Related