As simple as it sounds, can't think of a straightforward way of doing the below in Python.
my_string = "This is a test.\nAlso\tthis"
list_i_want = ["This", "is", "a", "test.", "\n", "Also", "this"]
I need the same behaviour as with string.split()
, i.e. remove any type and number of whitespaces, but excluding the line breaks \n
in which case I need it as a standalone list item.
How could I do this?
CodePudding user response:
Here's a code that works but is definitely not efficient/pythonic:
my_string = "This is a test.\nAlso\tthis"
l = my_string.splitlines() #Splitting lines
list_i_want = []
for i in l:
list_i_want.extend((i.split())) # Extending elements in list by splitting lines
list_i_want.extend('\n') # adding newline character
list_i_want.pop() # Removing last newline character
print(list_i_want)
Output:
['This', 'is', 'a', 'test.', '\n', 'Also', 'this']
CodePudding user response:
Split String using Regex findall()
import re
my_string = "This is a test.\nAlso\tthis"
my_list = re.findall(r"\S |\n", my_string)
print(my_list)
How it Works:
- "\S ": "\S" = non whitespace characters. " " is a greed quantifier so it find any groups of non-whitespace characters aka words
- "|": OR logic
- "\n": Find "\n" so it's returned as well in your list
Output:
['This', 'is', 'a', 'test.', '\n', 'Also', 'this']