Split string in Python while keeping the line break inside the generated list-CodePudding

As simple as it sounds, can't think of a straightforward way of doing the below in Python.

my_string = "This is a      test.\nAlso\tthis"
list_i_want = ["This", "is", "a", "test.", "\n", "Also", "this"]

I need the same behaviour as with string.split(), i.e. remove any type and number of whitespaces, but excluding the line breaks \n in which case I need it as a standalone list item.

How could I do this?

CodePudding user response：

Here's a code that works but is definitely not efficient/pythonic:

my_string = "This is a      test.\nAlso\tthis"
l = my_string.splitlines() #Splitting lines
list_i_want = []
for i in l:
    list_i_want.extend((i.split())) # Extending elements in list by splitting lines
    list_i_want.extend('\n') # adding newline character 

list_i_want.pop() # Removing last newline character
print(list_i_want)

Output:

['This', 'is', 'a', 'test.', '\n', 'Also', 'this']

CodePudding user response：

Split String using Regex findall()

import re

my_string = "This is a      test.\nAlso\tthis"
my_list = re.findall(r"\S |\n", my_string)

print(my_list)

How it Works:

"\S ": "\S" = non whitespace characters. " " is a greed quantifier so it find any groups of non-whitespace characters aka words
"|": OR logic
"\n": Find "\n" so it's returned as well in your list

Output:

['This', 'is', 'a', 'test.', '\n', 'Also', 'this']