Home > Mobile >  With regex, how to select first 3 words (commas/other characters included)?
With regex, how to select first 3 words (commas/other characters included)?

Time:11-06

Practicing some regex. Trying to only get Regular, Expressions, and abbreviated from the below data

Regular Expressions, abbreviated as Regex or Regexp, are a string of characters created within the framework of Regex syntax rules.

With (\w \S?), I get all words including a nonwhitespace character if present.

How would I get just Regular, Expressions, , and abbreviated ?

Edit:

To clarify, I'm looking for Regex Expressions, abbreviated separately without spaces

not Regex Expressions, abbreviated (spaces included here)

CodePudding user response:

Regex can't "select". It can only match and capture.

This captures the first 3 words (including optional trailing comma) as groups 1, 2 and 3:

^(\w ,?)\s (\w ,?)\s (\w ,?)

See live demo.

CodePudding user response:

as @Bohemian has pointed out, in regex you cannot select but rather capture. If the Regex implementation that you use supports it, then captured group will be returned as part of the match. For example in JS this will happen giving you the results separated.

Capturing groups are created by grouping in parenthesis the part of the match that you want to take out

To match those three specific words the regex would be the following

/(Regular) (Expressions), (abbreviated)/

Note that the words you care about are inside the parenthesis, while the parts of the string you don't want (like spaces and comas) are outside the string

You would use it like this (javascript code)

const string = "Regular Expressions, abbreviated as Regex or Regexp, are a string of characters created within the framework of Regex syntax rules." 
const regex = /(Regular) (Expressions), (abbreviated)/; 
string.match(regex); // returns [ "Regular Expressions, abbreviated", "Regular", "Expressions", "abbreviated" ]

Note that in the result the first element is the whole match, and the 2nd, 3rd and 4rh element are your capture groups that you can use as if you had selected them from the string

To match any three words separated by space or coma you could use

/(\w ),?\s?(\w ),?\s?(\w ),?\s?/

\w represents a char \s represents a space ? indicates that there might be 0 or 1 ocurrence of what is previews and finally the parenthesis group the word and leave out everything else the same as the example above

You would use it like this (javascript code)

const string = "Regular Expressions, abbreviated as Regex or Regexp, are a string of characters created within the framework of Regex syntax rules." 
const regex = /(\w ),?\s?(\w ),?\s?(\w ),?\s?/; 
string.match(regex); // returns [ "Regular Expressions, abbreviated", "Regular", "Expressions", "abbreviated" ]
  • Related