I have the following string that I am trying to match with RegEx:
286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)
339 in Cardboard Cutouts
2,945 in Jigsaws (Toys & Games)
This is my code/regex:
const matches = text.matchAll(/(?<!Top )([\d,|] ) in[\s\n ]([\w&'\s] )/g);
for(const match of matches){
const rank = parseInt(match[1].replace(/[^\d]/g, ''));
const category = match[2].trim()
console.log(`${category} = ${rank}`)
}
However, the the only parts it should match on are: 286,879 in Home & Kitchen
, 339 in Cardboard Cutouts
, 2,945 in Jigsaws (Toys & Games)
The expected output should be:
Home & Kitchen = 286879
Cardboard Cutouts = 339
Jigsaws = 2945
How can I adjust the regex to ignore the 100 in Home & Kitchen
string
Thanks
CodePudding user response:
regex Groups:
result
- one record from input (row)data
- numbers (including,
)cat
- category nameextra
- to be ignored
JS
- replace
result
with re-orderedcat
($3
),=
anddata
($2
) - replace
,
withempty
const regex = /(?<result>(?<data>^[\d|,] )(?: in )(?<cat>. ?)(?<extra>\s (?:\(. ?\)?)?))$/gm;
// Alternative syntax using RegExp constructor
// const regex = new RegExp('(?<result>(?<data>^[\\d|,] )(?: in )(?<cat>. ?)(?<extra>\\s (?:\\(. ?\\)?)?))$', 'gm')
const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)
339 in Cardboard Cutouts
2,945 in Jigsaws (Toys & Games)`;
const subst = `$3 = $2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst).replace(',', '');
console.log('Substitution result: ', result);
CodePudding user response:
If you only want to exclude the things in the parentheses, you could try something like this:
/^([\d,|] ) in[\s\n ]([\w&'\s] )(\s*\(.*\)\s*)?$/gm
And ignore the third capture group
CodePudding user response:
You might use 2 capture groups:
(?<!Top\s )\b(\d (?:,\d )?)\s in\s ([^()\n]*[^\s()])
Explanation
(?<!Top\s )
Negative lookbehind, assert notTop
followed by 1 whitespace chars directly to the left of the current position.\b
A word boundary to prevent a partial word match(\d (?:,\d )?)
Capture group 1, match 1 digits with an optional,
and 1 digits\s in\s
Matchin
between 1 whitespace chars(
Capture group 2[^()\n]*[^\s()]
Match optional chars other than a newline and(
)
)
Close group 2
const regex = /(?<!Top\s )\b(\d (?:,\d )?)\s in\s ([^()\n]*[^\s()])/;
[
"const str = `286,879 in Home & Kitchen (See Top 100 in Home & Kitchen)",
"339 in Cardboard Cutouts",
"2,945 in Jigsaws (Toys & Games)`;"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${m[2]} = ${m[1].replace(",", "")}`)
}
})
Note that using \s
could also match newlines.