How to keep the white space when splitting sentence to a list?-CodePudding

Below is the code that splits the sentence "s".

s = "1 a 3 bb  b8"
b = s.split()
print(b)

The output from the above code is ['1', 'a', '3', 'bb', 'b8'].

The desired output is ['1', 'a', '3', 'bb', ' b8']. Be aware that there is only one white space in the last field.

CodePudding user response：

That is a tricky one which make it hard to do with generic function and thus require some custom code.

I took s = s = "1 a 3 bb b8" with 3 white spaces before b8 to make it more fun :)

So first thing you can do is specify clearly the limiter in your split :

s.split(' ')

Would give the following result: ['1', 'a', '3', 'bb', '', '', 'b8']

Now you have to interpret the '' as a ' ' needed to be added to the next not empty string. In the following for loop you will implement your "business rules" that put the white spaces in the expected place.

split_list = []
buffer = ''
for elt in temp_split:
    if elt != "":
        split_list.append(buffer   elt)
        buffer = ''
    else:
        buffer  = ' '
print(split_list)

And the result is: ['1', 'a', '3', 'bb', ' b8']

CodePudding user response：

The code is not the best and not very efficient but it works. It dived spaces as field separators and spaces as data that way that the latter is replaced with a special string (e.g. $KEEP_THAT_SPACE$ ). In the next step the string is split by the spaces working as field separators. Then all sepcial strings in all elements are re-replaced with blank.

#!/usr/bin/env python3
s = "1 a 3 bb  b8"

# assume that there are only two-character-spaces
keep_placeholder = '$KEEP_THAT_SPACE$'

s = s.replace('  ', f' {keep_placeholder}')

b = s.split()

for index, element in enumerate(b):  # <- iterat
    while keep_placeholder in element:
        element = element.replace(keep_placeholder, ' ')
        b[index] = element

print(b)

The output is ['1', 'a', '3', 'bb', ' b8'] and please see that there is only one blank space in the beginning of the last field.

The code can easily adopted if you have fields with more then two blank spaces.