I want to use RegEx to split a string with space and parentheses
Example:
"The (New York city) :) is big"
=> Output:
["The", "New York city", ":)", "is", "big"]
I have tried this expression: /\([^\)] ?[\)]|[^ ] /
but the parentheses are still there so not good.
["The", "(New York city)", ":)", "is", "big"]
Has somebody any idea, please ? Thanks
CodePudding user response:
Actually, you can capture the parts you need and then, after applying scan
, subtract all nil
array items (that will occur since every match will only have a single capturing group value filled):
text = "The (New York city) :) is big"
arr = text.scan(/\(([^()] )\)|(\S )/).flatten - [nil]
# Or
# arr = text.scan(/\(([^()] )\)|(\S )/).flatten.compact
p arr # => ["The", "New York city", ":)", "is", "big"]
See the Ruby demo and the Rubular demo.
Details:
\(
- a(
char([^()] )
- Group 1: one or more chars other than(
and)
\)
- a)
char|
- or(\S )
- Group 2: one or more non-whitespace chars.
CodePudding user response:
One may write:
str = "The (New York city) :) is big"
str.gsub(/\(.*?\)|\S /).with_object([]) do |s,a|
a << (s[0]=='(' && s[-1] == ')' ? s[1..-2] : s)
end
This uses the form of String#gsub that takes one argument--here a regular expression--and no block, returning an enumerator:
enum = str.gsub(/\(.*?\)|\S /)
#=> #<Enumerator: "The (New York City) :) is big":gsub(/\(.*?\)|\S /)>
We can see the (string) objects that will be generated by the enumerator by converting it to an array:
enum.to_a
#=> ["The", "(New York City)", ":)", "is", "big"]
We can make the regular expression self-documenting by expressing it in free-spacing mode:
/
\( # match '('
.*? # match zero or more characters, lazily
\) # match ')'
| # or
\S # match one or more characters other than white spaces
/x # free-spacing regex definition mode