Home > Software engineering >  How to split a string in python by certain characters?
How to split a string in python by certain characters?

Time:01-04

I am trying to solve a problem with prefix notation, but I am stuck on the part, where I want to split my string into an array: If I have the input 22 2 I want to get the array to look like this:[' ', '22', '2'] I tried using the

import re 

function, but I am not sure how it works. I tried the

word.split(' ')

method, but it only helps with the spaces.. any ideas? P.S: In the prefix notation I will also have - and *. So I need to split the string so the space is not in the array, and , -, * is in the array I am thinking of

word = input()
array = word.split(' ')

Then after that I am thinking of splitting a string by these 3 characters.

Sample input: ' -12 23*67 1'

Output: [' ', '-', '12', '23', '*', '67', '1']

CodePudding user response:

You can use re to find patterns in text, it seems you are looking for either one of these: , - and * or a digit group. So compile a pattern that looks for that and find all that match this pattern and you will get a list:


import re

pattern = re.compile(r'([- *]|\d )')

string = ' -12 23*67 1'
array = pattern.findall(string)
print(array)

# Output:
# [' ', '-', '12', '23', '*', '67', '1']

Also a bit of testing (comparing your sample strings with the expected output):

test_cases = {
    ' -12 23*67 1': [' ', '-', '12', '23', '*', '67', '1'],
    ' 22 2': [' ', '22', '2']
}

for string, correct in test_cases.items():
    assert pattern.findall(string) == correct

print('Tests completed successfully!')

Pattern explanation (you can read about this in the docs linked below):
r'([- *]|\d )'
r in front to make it a raw string so that Python interprets all the characters literally, this helps with escape sequences in the regex pattern because you can escape them with one backslash
(...) parentheses around (they are not necessary in this case) indicate a group which can later be retrieved if needed (but in this case they don't matter much)
[...] indicates that any single character from this group can be matched so it will match if any of -, and * will be present
| logical or, meaning that can match either side (to differentiate between numbers and special characters in this case)
\d special escape sequence for digits, meaning to match any digit, the there indicates matching any one or more digits

Useful:

  • re module, the docs there explain what each character in the pattern does
  • Related