Home > OS >  RegEx ignore word preciding a character set
RegEx ignore word preciding a character set

Time:05-02

I have the following string that I am trying to match with RegEx:

286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)

This is my code/regex:

            const matches = text.matchAll(/(?<!Top )([\d,|] ) in[\s\n ]([\w&'\s] )/g);
            for(const match of matches){
                const rank = parseInt(match[1].replace(/[^\d]/g, ''));
                const category = match[2].trim()
                console.log(`${category} = ${rank}`)
            }

However, the the only parts it should match on are: 286,879 in Home & Kitchen, 339 in Cardboard Cutouts, 2,945 in Jigsaws (Toys & Games)

The expected output should be:

Home & Kitchen = 286879

Cardboard Cutouts = 339

Jigsaws = 2945

How can I adjust the regex to ignore the 100 in Home & Kitchen string

Thanks

CodePudding user response:

regex Groups:
  1. result - one record from input (row)
  2. data - numbers (including ,)
  3. cat - category name
  4. extra - to be ignored
JS
  • replace result with re-ordered cat ($3), = and data ($2)
  • replace , with empty
const regex = /(?<result>(?<data>^[\d|,] )(?: in )(?<cat>. ?)(?<extra>\s (?:\(. ?\)?)?))$/gm;

// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\\d|,] )(?: in )(?<cat>. ?)(?<extra>\\s (?:\\(. ?\\)?)?))$', 'gm')

const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)  
339 in Cardboard Cutouts    
2,945 in Jigsaws (Toys & Games)`;
const subst = `$3 = $2`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');

console.log('Substitution result: ', result);

CodePudding user response:

If you only want to exclude the things in the parentheses, you could try something like this:

/^([\d,|] ) in[\s\n ]([\w&'\s] )(\s*\(.*\)\s*)?$/gm

And ignore the third capture group

CodePudding user response:

You might use 2 capture groups:

(?<!Top\s )\b(\d (?:,\d )?)\s in\s ([^()\n]*[^\s()])

Explanation

  • (?<!Top\s ) Negative lookbehind, assert not Top followed by 1 whitespace chars directly to the left of the current position.
  • \b A word boundary to prevent a partial word match
  • (\d (?:,\d )?) Capture group 1, match 1 digits with an optional , and 1 digits
  • \s in\s Match in between 1 whitespace chars
  • ( Capture group 2
    • [^()\n]*[^\s()] Match optional chars other than a newline and ( )
  • ) Close group 2

Regex demo

const regex = /(?<!Top\s )\b(\d (?:,\d )?)\s in\s ([^()\n]*[^\s()])/;

[
  "const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
  "339 in Cardboard Cutouts",
  "2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(`${m[2]} = ${m[1].replace(",", "")}`)
  }
})

Note that using \s could also match newlines.

  • Related