Home > Software design >  Break string into words using scan method regexp, if word has `'` character, drop this charac
Break string into words using scan method regexp, if word has `'` character, drop this charac

Time:10-14

sample_string = "let's could've they'll you're won't"
sample_string.scan(/\w /)

Above gives me:

["let", "s", "could", "ve", "they", "ll", "you", "re", "won", "t"]

What I want:

["let", "could", "they", "you", "won"]

Been playing around in https://rubular.com/ and trying assertions like \w (?<=') but no luck.

CodePudding user response:

You can use

sample_string.scan(/(?<![\w'])\w /)
sample_string.scan(/\b(?<!')\w /)

See the Rubular demo. The patterns (they are absolute synonyms) match

  • (?<![\w']) - a location in the string that is not immediately preceded with a word or ' char
  • \b(?<!') - a word boundary position which is not immediately preceded with a ' char
  • \w - one or more word chars.

See the Ruby demo:

sample_string = "let's could've they'll you're won't"
p sample_string.scan(/(?<![\w'])\w /)
# => ["let", "could", "they", "you", "won"]

CodePudding user response:

Given:

> sample_string = "let's could've they'll you're won't"

You can do split and map:

> sample_string.split.map{|w| w.split(/'/)[0]}
=> ["let", "could", "they", "you", "won"]
  • Related