Home > Enterprise >  Why is my RegExp.prototype.exec() returning array of undefined?
Why is my RegExp.prototype.exec() returning array of undefined?

Time:08-21

I have a regular expression like this:

/(AAPL)|((Apple,?.?(([iI]nc)\.?)?))|((technology|electronics){1,2})|(consumer electronics)|((timothy d. cook|thimothy cook))|((smart phones|computers|laptops){1,3})|((iphone|ipad|ipod|airpods|macbook){1,5})/g

And a string:

"AAPL apple inc., apple, apple, inc., apple, inc technology, electronics consumer electronics timothy d. cook, thimothy cook smart phones, computers, laptops iphone, ipad, ipod, airpods, macbook"

The result of RegExp.prototype.match() is an Array with mostly undefined items:

[ "AAPL", "AAPL", undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined, undefined ]

However, when I try it online using https://regexr.com/ it matches all words I expect it to match.

I have also tried wrapping the above Regular Expression inside brackets like so (string){1,100) but still get an Array with mostly undefined items.

What am I not understanding about regular expressions?

Below is my JS code:

type SearchPatterns = {
    [key: string]: {
        [key: string]: RegExp;
    }
  };

const SEARCH_PATTERNS: SearchPatterns = {
    AAPL:  {
        symbol: /(AAPL)/, //"AAPL"
        name: /((Apple,?.?(([iI]nc)\.?)?))/, //"apple inc.", "apple", "apple, inc.", "apple, inc"
        sectors: /((technology|electronics){1,2})/, //"technology", "electronics"
        industry: /(consumer electronics)/, //"consumer electronics"
        executives: /((timothy d. cook|thimothy cook))/, //"timothy d. cook", "thimothy cook"
        productCategories: /((smart phones|computers|laptops){1,3})/, //"smart phones", "computers", "laptops"
        products: /((iphone|ipad|ipod|airpods|macbook){1,5})/ //"iphone", "ipad", "ipod", "airpods", "macbook"
    }
};

function screenStockInfo(
    symbol: string,
    corpus: string,
    searchPatterns
    ) {
    const regStr = (Object.values(searchPatterns[symbol]) as RegExp[])
                        .map(value => value.source)
                        .join("|");
    const dynamicRegEx = new RegExp(regStr, "g");
    console.log(dynamicRegEx.exec(corpus));
    console.log(dynamicRegEx);
    return dynamicRegEx.test(corpus);
};

CodePudding user response:

For methods that include subgroups in the results (such as with RegExp.test, RegExp.exec, and String.matchAll), any unmatched capturing subgroup is represented by the undefined value. This is because every subgroup is numbered and thus something must appear at each subgroup's numeric index in the results. If unmatched groups were left out, the indices of the matching groups would be different than their group numbers.

For any subgroups you don't want to be captured, use non-capturing subgroups (or use String.match if you only need full matches, and no subgroups). If you need to capture a subgroup, simply ignore any missing values. You can also use named groups, which produce more readable results and are less brittle (as you're less likely to use the wrong index).

Unlike String.match and String.matchAll, both RegExp.test and RegExp.exec find a single match at a time. You must call them multiple times to get successive matches. This is covered in MDN documentation:

JavaScript RegExp objects are stateful when they have the global or sticky flags set (e.g., /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, test() can be used to iterate over multiple matches in a string of text (with capture groups).

JavaScript RegExp objects are *stateful( when they have the global or sticky flags set (e.g. /foo/g or /foo/y). They store a lastIndex from the previous match. Using this internally, exec() can be used to iterate over multiple matches in a string of text (with capture groups), as opposed to getting just the matching strings with String.prototype.match().

For global RegExps, String.match returns full matches, but no subgroups. Note this is covered in the documentation on MDN:

Return value

An Array whose contents depend on the presence or absence of the global (g) flag, or null if no matches are found.

  • if the g flag is used, all results matching the complete regular expression will be returned, but capturing groups are not included.

String.matchAll is similar to RegExp.exec in that it iterates over and returns successive matches, but does so as a generator, so you can spread the results into an array:

[...corpus.matchAll(dynamicRegEx)];
  • Related