RegEx ignore word preciding a character set-CodePudding

I have the following string that I am trying to match with RegEx:

286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)

This is my code/regex:

            const matches = text.matchAll(/(?<!Top )([\d,|] ) in[\s\n ]([\w&'\s] )/g);
            for(const match of matches){
                const rank = parseInt(match[1].replace(/[^\d]/g, ''));
                const category = match[2].trim()
                console.log(`${category} = ${rank}`)
            }

However, the the only parts it should match on are: 286,879 in Home & Kitchen, 339 in Cardboard Cutouts, 2,945 in Jigsaws (Toys & Games)

The expected output should be:

Home & Kitchen = 286879

Cardboard Cutouts = 339

Jigsaws = 2945

How can I adjust the regex to ignore the 100 in Home & Kitchen string

Thanks

CodePudding user response：

regex Groups:

result - one record from input (row)
data - numbers (including ,)
cat - category name
extra - to be ignored

JS

replace result with re-ordered cat ($3), = and data ($2)
replace , with empty

const regex = /(?<result>(?<data>^[\d|,] )(?: in )(?<cat>. ?)(?<extra>\s (?:\(. ?\)?)?))$/gm;

// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\\d|,] )(?: in )(?<cat>. ?)(?<extra>\\s (?:\\(. ?\\)?)?))$', 'gm')

const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)`;
const subst = `$3 = $2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');

console.log('Substitution result: ', result);

CodePudding user response：

If you only want to exclude the things in the parentheses, you could try something like this:

/^([\d,|] ) in[\s\n ]([\w&'\s] )(\s*\(.*\)\s*)?$/gm

And ignore the third capture group

CodePudding user response：

You might use 2 capture groups:

(?<!Top\s )\b(\d (?:,\d )?)\s in\s ([^()\n]*[^\s()])

Explanation

(?<!Top\s ) Negative lookbehind, assert not Top followed by 1 whitespace chars directly to the left of the current position.
\b A word boundary to prevent a partial word match
(\d (?:,\d )?) Capture group 1, match 1 digits with an optional , and 1 digits
\s in\s Match in between 1 whitespace chars
( Capture group 2
- [^()\n]*[^\s()] Match optional chars other than a newline and ( )
) Close group 2

Regex demo

const regex = /(?<!Top\s )\b(\d (?:,\d )?)\s in\s ([^()\n]*[^\s()])/;

[
  "const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
  "339 in Cardboard Cutouts",
  "2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(`${m[2]} = ${m[1].replace(",", "")}`)
  }
})

Note that using \s could also match newlines.