Regex to select on or multiple words after brackets-CodePudding

Here is the problem I am trying to solve:

I have a dictionary of tags in JS:

Tags= ['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5']

I have a database request that provides me with a string. I need to pick certain values from the string using Regex with the following conditions:

I would like to pick the words (including ä,ë,ü,ö) that come immediately after [X] (the letter X between brackets), but NOT the words that come after:
- [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
  - Expected output: sunset
I would like to pick the word OR the words (including ä,ë,ü,ö) that come after one of the tags in my dictionary variable, even if there are other brackets in between:

[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag] -- Expected output: a
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag] -- Expected output: paintball
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht -- Expected output: paintball, yacht.
[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball -- Expected output: snowball.

I tried hard, and had other people help in another thread, but still not working.

The final result is to be used in JS, to concat the word after [X] and the single/multiple words after [Tag] into one string.

Thank you for your help

CodePudding user response：

Simply you can take the left boundary (.*?) right boundary

Example : [X]](.*?)\W[Y] : For Sunset

you can use https://regex101.com/ which will guide too.

CodePudding user response：

Here are example input strings and code to extract the words based on your spec:

const strings = [
  '[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]',
  '[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]',
  '[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht',
  '[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball'
];
const tags = ['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5'];
const regex1 = new RegExp('\\[X\\]([\\wäëüö] ).*?\\[(?:'   tags.join('|')   ')\\](.*)', 'i');
const regex2 = /\[[^\]]*\]/;

strings.forEach(str => {
  let result = [];
  let m = str.match(regex1);
  if(m) {
    result.push(m[1]);
    m[2].split(regex2).filter(Boolean).forEach(s => {
      result.push(s);
    });
  }
  console.log(str   '\n ==> '   result.join(', '));
});

Output:

[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
 ==> sunset, a
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]
 ==> sunset, paintball
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht
 ==> sunset, paintball, yacht
[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball
 ==> sunset, snowball

Explanation:

regex1:
- built dynamically based on your tag array
- two capture groups, one for the word after tag [X], one for all text that follows a tag in your tag array
if there is a match:
- the first capture group is added to the result
- the second capture group:
  - is split on tag pattern [...]
  - the .filter(Boolean) filters out empty strings
  - you could filter further by word pattern of interest
  - each split item is added to the result
join the result array with any delimiter you want, here ,