Home > Mobile >  Ruby: split string on parentheses and space
Ruby: split string on parentheses and space

Time:10-13

I want to use RegEx to split a string with space and parentheses

Example:

"The (New York city) :) is big"

=> Output:

["The", "New York city", ":)", "is", "big"]

I have tried this expression: /\([^\)] ?[\)]|[^ ] /

but the parentheses are still there so not good.

["The", "(New York city)", ":)", "is", "big"]

Has somebody any idea, please ? Thanks

CodePudding user response:

Actually, you can capture the parts you need and then, after applying scan, subtract all nil array items (that will occur since every match will only have a single capturing group value filled):

text = "The (New York city) :) is big"
arr = text.scan(/\(([^()] )\)|(\S )/).flatten - [nil]
# Or 
# arr = text.scan(/\(([^()] )\)|(\S )/).flatten.compact
p arr # => ["The", "New York city", ":)", "is", "big"]

See the Ruby demo and the Rubular demo.

Details:

  • \( - a ( char
  • ([^()] ) - Group 1: one or more chars other than ( and )
  • \) - a ) char
  • | - or
  • (\S ) - Group 2: one or more non-whitespace chars.

CodePudding user response:

One may write:

str = "The (New York city) :) is big"
str.gsub(/\(.*?\)|\S /).with_object([]) do |s,a|
  a << (s[0]=='(' && s[-1] == ')' ? s[1..-2] : s)
end

This uses the form of String#gsub that takes one argument--here a regular expression--and no block, returning an enumerator:

enum = str.gsub(/\(.*?\)|\S /)
  #=> #<Enumerator: "The (New York City) :) is big":gsub(/\(.*?\)|\S /)>

​We can see the (string) objects that will be generated by the enumerator by converting it to an array:

enum.to_a
  #=> ["The", "(New York City)", ":)", "is", "big"]

We can make the regular expression self-documenting by expressing it in free-spacing mode:

/
\(    # match '('
.*?   # match zero or more characters, lazily
\)    # match ')'
|     # or
\S    # match one or more characters other than white spaces
/x    # free-spacing regex definition mode
  • Related