Here is the problem I am trying to solve:
I have a dictionary of tags in JS:
Tags= ['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5']
I have a database request that provides me with a string. I need to pick certain values from the string using Regex with the following conditions:
I would like to pick the words (including ä,ë,ü,ö) that come immediately after [X] (the letter X between brackets), but NOT the words that come after:
- [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
- Expected output: sunset
- [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
I would like to pick the word OR the words (including ä,ë,ü,ö) that come after one of the tags in my dictionary variable, even if there are other brackets in between:
- [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag] -- Expected output: a
- [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag] -- Expected output: paintball
- [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht -- Expected output: paintball, yacht.
- [UnRelatedTag][X]sunset[Y]beach[Tag1]snowball -- Expected output: snowball.
I tried hard, and had other people help in another thread, but still not working.
- The final result is to be used in JS, to concat the word after [X] and the single/multiple words after [Tag] into one string.
Thank you for your help
CodePudding user response:
Simply you can take the left boundary (.*?) right boundary
Example : [X]](.*?)\W[Y] : For Sunset
you can use https://regex101.com/ which will guide too.
CodePudding user response:
Here are example input strings and code to extract the words based on your spec:
const strings = [
'[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]',
'[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]',
'[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht',
'[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball'
];
const tags = ['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5'];
const regex1 = new RegExp('\\[X\\]([\\wäëüö] ).*?\\[(?:' tags.join('|') ')\\](.*)', 'i');
const regex2 = /\[[^\]]*\]/;
strings.forEach(str => {
let result = [];
let m = str.match(regex1);
if(m) {
result.push(m[1]);
m[2].split(regex2).filter(Boolean).forEach(s => {
result.push(s);
});
}
console.log(str '\n ==> ' result.join(', '));
});
Output:
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
==> sunset, a
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]
==> sunset, paintball
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht
==> sunset, paintball, yacht
[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball
==> sunset, snowball
Explanation:
regex1
:- built dynamically based on your tag array
- two capture groups, one for the word after tag
[X]
, one for all text that follows a tag in your tag array
- if there is a match:
- the first capture group is added to the result
- the second capture group:
- is split on tag pattern
[
...]
- the
.filter(Boolean)
filters out empty strings - you could filter further by word pattern of interest
- each split item is added to the result
- is split on tag pattern
- join the result array with any delimiter you want, here
,