How do I convert a String: 'Hello world!'
to an array: ['Hello', ' ', 'world!']
with all spaces preserved?
I tried to convert the string using the split
method with different parameters, but I didn't find the right solution.
Also I didn't find any other method in the documentation (Class: String (Ruby 3.1.0)) suitable for solving this problem.
CodePudding user response:
It just occured to me, that you could use scan
. Assuming that your string is stored in the variable s
, and you want to separate space regions and non-space regions, you could do a
s.scan(/[ ] |[^ ] /)
which would yield in your case
["Hello", " ", "world!"]
CodePudding user response:
Use String#scan Instead of String#split
You don't want to use String#split because that won't preserve your spaces. You want to use String#scan or String#partition instead. Using Unicode character properties, you can scan for matches with:
'Hello world!'.scan /[\p{Alnum}\p{Punct}] |\p{Space} /
#=> ["Hello", " ", "world!"]
You can also use POSIX character classes (pronounced "bracket expressions" in Ruby) to do the same thing if you prefer. For example:
'Hello world!'.scan /[[:alnum:][:punct:]] |[[:space:]] /
#=> ["Hello", " ", "world!"]
Either of these options will be more robust than solutions that rely on ASCII-only characters or literal whitespace atoms, but if you know your strings won't include other types of characters or encodings then those solutions will work too.
Using String#partition
For the very simple use case in your original example, you only have two words separated by whitespace. That means you can also use String#partition to partition on the sequential whitespace. That will split the string into exactly three elements, preserving the whitespace that partitions the words. For example:
'Hello world!'.partition /\s /
#=> ["Hello", " ", "world!"]
While simpler, the partitioning approach won't work as well with longer strings such as:
'Goodbye cruel world!'.partition /\s /
#=> ["Goodbye", " ", "cruel world!"]
so String#scan is going to be a better and more flexible approach for the general use case. However, anytime you want to split a string into three elements, or to preserve the partitioning element itself, #partition can be very handy.