Home > OS >  Improving on 'if/else' strcmp() ladder to determine a useable value
Improving on 'if/else' strcmp() ladder to determine a useable value

Time:01-27

//...
if( strcmp( str, "January" ) == 0 )
    month = 1;
else if( strcmp( str, "February") == 0 )
    month = 2;
//...

Q: Is there any more efficient way of determining that, for instance, "April" is the fourth month of the year? Repeated calls to strcmp() must be terribly inefficient, and if/else ladders tedious to code. Sometimes it's "March" and sometimes it's abbreviated as "MAR"... There must be a better way...

Putting the known strings in a sorted array of structs would allow, at least, binary searching, but still involves a lot of guesswork from the code.

CodePudding user response:

This is a Can I answer my own question? answer. Other answers are welcomed.


There are several ways of translating an arbitrary string from a finite set of strings into a concise, useable form. Most of these involve an iterative (or a sub-optimal linear) search involving repeated comparisons (that may need to account for case sensitivity.)

A response to my response to a recent question suggested "sharing" an (admittedly arcane) hashing function that, with awareness of false positives, return the month ordinal (1-12) when passed a string containing the name of a month (English) in 7-bit ASCII. The function performs primitive operations on the 2nd & 3rd character and out-pops the function's hash value of the string. Note, "January", "jan" and "JAN" all return the value 1. Likewise "feb", "FEBRUARY" and "Feb" would return the value 2.

static int monthOrd( char cp[] ) { return "DIE@CB@LJF@HAG@K"[ cp[1]/4&7 ^ cp[2]*2 &0xF ] &0xF; }

The operations shown were uncovered through a "brute force" permutation of a number of primitive operations seeking a combination that would return 12 different values between 0x0 and 0xF (4 bits). The reader is encouraged to take apart each step of the mangling of the bits of the two ASCII characters. This result was not "invented", but was "discovered".

After the bits of two characters have been mangled, the value is used as an index into a string (aka "a cheap LUT") whose 12 letters A-L are positioned so that "?an" (January) will mangle to an index for the letter 'A'. Masking the low 4 bits of that letter yields the value 1 as the ordinal for the string "JANUARY"... 1 will be the return value when the function is passed variations of the string "Jan".

NB: Using this function allows the caller to check that the string is indeed "JAN", "jan", "January" as suits the application. The caller need not try to match any of the names of the other 11 months. This function WILL return the false positive value 1 for the string "Random", so the caller need only validate against a single month's name (length and case appropriate to the application.)


Bonus round:

static int wkdayOrd( char cp[] ) { return "65013427"[*cp/2   ~cp[1] & 0x7] & 0x7; }

An equivalent function that converts "Sun(day)" (case insensitive) to 1, "MON" to 2, "tue" to 3, etc...

Again, the caller must confirm the string against only ONE day's name to avoid "false positives".


While we're here, the following is an equivalent function for the "number names" from "zero" to "ten", again, case insensitive. (Number names are not abbreviated like month names or weekday names.)

static int numberOrd( char cp[] ) { return "@~IBAH~FCGE~~DJ~"[ ( cp[0] ^ cp[1]/2   cp[2]*4 ) & 0xF ] & 0xF; }

CodePudding user response:

I've checked what happens with gperf passing it all months as "January", "Jan", "JANUARY", "JAN", "january", "jan", and so on.

struct months {
char *name;
int number;
};

#define TOTAL_KEYWORDS 69
#define MIN_WORD_LENGTH 3
#define MAX_WORD_LENGTH 9
#define MIN_HASH_VALUE 3
#define MAX_HASH_VALUE 218
/* maximum key range = 216, duplicates = 0 */

#ifdef __GNUC__
__inline
#else
#ifdef __cplusplus
inline
#endif
#endif
static unsigned int
hash (register const char *str, register size_t len)
{
  static unsigned char asso_values[] =
    {
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219,  10,  80,  75,  95,   5,
      125,  95, 219, 219,   5, 219,  95,  55,  45,  60,
       60, 219,  85,  95,  50,  90,  25, 219, 219,  12,
      219, 219, 219, 219, 219, 219, 219,   0,  40,  35,
       35,  35,  40,  25, 219, 219,   0, 219,  10,  50,
        0,  25,  15, 219,  15,  35,  30,  10,  25, 219,
      219,  25, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219, 219, 219, 219, 219,
      219, 219, 219, 219, 219, 219
    };
  return len   asso_values[(unsigned char)str[2]]   asso_values[(unsigned char)str[1]]   asso_values[(unsigned char)str[0]];
}

struct months *
in_word_set (register const char *str, register size_t len)
{
  static struct months wordlist[] =
    {
      {""}, {""}, {""},
      {"jan",1},
      {""}, {""}, {""},
      {"january",1},
      {"Jan",1},
      {""}, {""}, {""},
      {"January",1},
      {"jun",6},
      {"june",6},
      {""}, {""}, {""},
      {"Jun",6},
      {"June",6},
      {""}, {""}, {""},
      {"jul",7},
      {"july",7},
      {""}, {""}, {""},
      {"Jul",7},
      {"July",7},
      {""}, {""}, {""},
      {"apr",4},
      {""},
      {"april",4},
      {""}, {""},
      {"aug",8},
      {""}, {""},
      {"august",8},
      {""},
      {"Apr",4},
      {""},
      {"April",4},
      {""}, {""},
      {"Aug",8},
      {""}, {""},
      {"August",8},
      {""},
      {"nov",11},
      {""}, {""}, {""}, {""},
      {"november",11},
      {""}, {""}, {""}, {""},
      {"JAN",1},
      {""}, {""}, {""},
      {"JANUARY",1},
      {"mar",3},
      {""},
      {"march",3},
      {""}, {""},
      {"Mar",3},
      {""},
      {"March",3},
      {""}, {""},
      {"may",5},
      {""},
      {"MAY",5},
      {""}, {""},
      {"May",5},
      {""}, {""}, {""}, {""},
      {"sep",9},
      {""}, {""}, {""}, {""},
      {"oct",10},
      {"september",9},
      {""}, {""},
      {"october",10},
      {"Nov",11},
      {""}, {""}, {""}, {""},
      {"November",11},
      {""}, {""}, {""}, {""},
      {"dec",12},
      {""}, {""}, {""}, {""},
      {"december",12},
      {""}, {""}, {""}, {""},
      {"feb",2},
      {""}, {""}, {""}, {""},
      {"february",2},
      {""}, {""}, {""}, {""},
      {"Oct",10},
      {""}, {""}, {""},
      {"October",10},
      {"NOV",11},
      {""}, {""}, {""}, {""},
      {"NOVEMBER",11},
      {""}, {""}, {""}, {""},
      {"JUN",6},
      {"JUNE",6},
      {""}, {""}, {""},
      {"Sep",9},
      {""}, {""}, {""}, {""},
      {"MAR",3},
      {"September",9},
      {"MARCH",3},
      {""}, {""},
      {"APR",4},
      {""},
      {"APRIL",4},
      {""}, {""},
      {"SEP",9},
      {""}, {""}, {""}, {""},
      {"Dec",12},
      {"SEPTEMBER",9},
      {""}, {""}, {""},
      {"December",12},
      {""}, {""}, {""}, {""},
      {"DEC",12},
      {""}, {""}, {""}, {""},
      {"DECEMBER",12},
      {""}, {""}, {""}, {""},
      {"OCT",10},
      {""}, {""}, {""},
      {"OCTOBER",10},
      {"JUL",7},
      {"JULY",7},
      {""}, {""}, {""},
      {"AUG",8},
      {""}, {""},
      {"AUGUST",8},
      {""},
      {"Feb",2},
      {""}, {""}, {""}, {""},
      {"February",2},
      {""}, {""}, {""}, {""},
      {"FEB",2},
      {""}, {""}, {""}, {""},
      {"FEBRUARY",2}
    };

  if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
    {
      register unsigned int key = hash (str, len);

      if (key <= MAX_HASH_VALUE)
        {
          register const char *s = wordlist[key].name;

          if (*str == *s && !strcmp (str   1, s   1))
            return &wordlist[key];
        }
    }
  return 0;
}

I guess this is pretty fast and requires only one strcmp per string. This is exactly what is used in GCC for keyword checking.

A very nice introduction to gperf is available here.

  • Related