Home > OS >  Regex to select on or multiple words after brackets
Regex to select on or multiple words after brackets

Time:10-12

Here is the problem I am trying to solve:

I have a dictionary of tags in JS:

Tags= ['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5']

I have a database request that provides me with a string. I need to pick certain values from the string using Regex with the following conditions:

  1. I would like to pick the words (including ä,ë,ü,ö) that come immediately after [X] (the letter X between brackets), but NOT the words that come after:

    • [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
      • Expected output: sunset
  2. I would like to pick the word OR the words (including ä,ë,ü,ö) that come after one of the tags in my dictionary variable, even if there are other brackets in between:

  • [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag] -- Expected output: a
  • [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag] -- Expected output: paintball
  • [UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht -- Expected output: paintball, yacht.
  • [UnRelatedTag][X]sunset[Y]beach[Tag1]snowball -- Expected output: snowball.

I tried hard, and had other people help in another thread, but still not working.

  1. The final result is to be used in JS, to concat the word after [X] and the single/multiple words after [Tag] into one string.

Thank you for your help

CodePudding user response:

Simply you can take the left boundary (.*?) right boundary

Example : [X]](.*?)\W[Y] : For Sunset

you can use https://regex101.com/ which will guide too.

CodePudding user response:

Here are example input strings and code to extract the words based on your spec:

const strings = [
  '[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]',
  '[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]',
  '[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht',
  '[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball'
];
const tags = ['Tag1', 'Tag2', 'Tag3', 'Tag4', 'Tag5'];
const regex1 = new RegExp('\\[X\\]([\\wäëüö] ).*?\\[(?:'   tags.join('|')   ')\\](.*)', 'i');
const regex2 = /\[[^\]]*\]/;

strings.forEach(str => {
  let result = [];
  let m = str.match(regex1);
  if(m) {
    result.push(m[1]);
    m[2].split(regex2).filter(Boolean).forEach(s => {
      result.push(s);
    });
  }
  console.log(str   '\n ==> '   result.join(', '));
});
Output:

[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]a[UnrelatedTag][UnrelatedTag][UnrelatedTag]
 ==> sunset, a
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]
 ==> sunset, paintball
[UnRelatedTag][X]sunset[Y]beach[Tag1][UnrelatedTag]paintball[UnrelatedTag][UnrelatedTag][UnrelatedTag]yacht
 ==> sunset, paintball, yacht
[UnRelatedTag][X]sunset[Y]beach[Tag1]snowball
 ==> sunset, snowball

Explanation:

  • regex1:
    • built dynamically based on your tag array
    • two capture groups, one for the word after tag [X], one for all text that follows a tag in your tag array
  • if there is a match:
    • the first capture group is added to the result
    • the second capture group:
      • is split on tag pattern [...]
      • the .filter(Boolean) filters out empty strings
      • you could filter further by word pattern of interest
      • each split item is added to the result
  • join the result array with any delimiter you want, here ,
  • Related