Home > Enterprise >  How to read two words in one string
How to read two words in one string

Time:03-15

I have sample input file like this

1344 Muhammad Ayyubi 1
1344 Muhammad Ali Ayyubi 1

First, last number and surname are separated with tab character. However, a person may have two names. In that case, names are separated with whitespace.

I am trying to read from input file and store them in related variables.

Here is my code that successfully reads when a person has only one name.

fscanf(fp, "%d\t%s\t%s\t%d", &id, firstname, surname, &roomno)

The question is that is there any way to read the input file which may contain two first names.

Thanks in advance.

CodePudding user response:

You can use the %[ specifier to read whitespace in a string:

fscanf(fp, "%d\t%[^\t]\t%[^\t]\t%d", &id, firstname, surname, &roomno)

CodePudding user response:

Read the line with fgets() which then saves that as a string.

Then parse the string. Save into adequate sized buffers.

Scanning with "\t", scans any number of white-space - zero or more. Use TABFMT below to scan 1 tab character.

Test results along the way.

This code uses " %n" to see that parsing reached that point and nothing more on the line.

#define LINE_N 100
char line[LINE_N];
int id, 
char firstname[LINE_N];
char surname[LINE_N];
int roomno;

if (fgets(line, sizeof line, fp)) {
  int n = 0;
  #define TABFMT "%*1[\t]"
  #define NAMEFMT "%[^\t]"
  sscanf(line, "%d" TABFMT NAMEFMT TABFMT NAMEFMT TABFMT "%d %n", 
      &id, firstname, surname, &roomno, &n);
  if (n == 0 || line[n]) {
    fprintf(stderr, "Failed to parse <%s>\n", line);
  } else {
    printf("Success: %d <%s> <%s> %d\n", id, firstname, surname, roomno);
  }
}

If the last name or first is empty, this code treats that as an error.

Alternate approach would read the line into a string and then use strcspn(), strchr() or strtok() to look for tabs to parse into the 4 sub-strings`.


The larger issue missed by OP is what to do about ill-formatted input? Error handling is often dismissed with "input will be well formed", yet in real life, bad input does happen and also is the crack the hackers look for. Defensive coding takes steps to validate input. Pedantic code would not use *scanf() at all, but instead fgets(), strcspn(), strspn(), strchr(), strtol() and test, test, test. This answer is a middle-of-the-road testing effort.

CodePudding user response:

The answers to the question as stated are reasonable, but the question is wrong.

The end-goal here is to read human-names. Human names come in quite a variety - not always first, [middle,] last. Baking in this assumption is an error in design.

This is a many, many times repeated error. Better not to repeat.

Simplest solution is to re-order the data fields, and make no assumptions about the structure of names. So the input data becomes:

1344 1 Muhammad Ayyubi
1344 1 Muhammad Ali Ayyubi

Scanning code then can pull off the first two numeric fields, and use the remainder of the line for name (making no assumptions about structure).

More generally, if you do need to scan fields with embedded whitespace, remember the 32 "control" characters in the ASCII character table, of which ~24 have no assigned semantics (in current use). You can add structure to a file of text, for example with use of (from man ascii:

034   28    1C    FS  (file separator)        
035   29    1D    GS  (group separator)       
036   30    1E    RS  (record separator)      
037   31    1F    US  (unit separator)        

There is almost no case where text fields are allowed these characters.

  • Related