I have a string of type
string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
Output = ['Assam', 'Meghalaya','West Bengal','Odhisa']
I tried many ways, but I always end up splitting the substring West Bengal into two halves... I am not able to cover the edge case mentioned above.
What I tried was pass the string into the below function and then split it.. But not working!!!!
def remove_alpha(string):
option = ['[A]', '[B]', '[C]', '[D]']
res = ""
for i in option:
res = string.replace(i, '')
string = res
return res
CodePudding user response:
You can use regex for this:
import re
string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
pattern = re.compile(r"] (.*?)(?:\[|$)")
output = pattern.findall(string.strip())
print(output)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']
- How it works: https://regex101.com/r/5peFyC/1
re
module
CodePudding user response:
You can split on regex patterns using re.split
:
import re
string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
print(re.split(r"\s*\[\w\]\s*", string.strip())[1:])
Note that we first eliminate the spaces around the string by strip()
, then we use r"\s*\[\w\]\s*"
to match up options like [A]
with possible spaces. Since the first element of the result is empty, we remove that by slicing [1:]
at the end.
CodePudding user response:
This can be done in a one-line list-comprehension plus a special case of the last option:
[string[string.find(option[i]):string.find(option[i 1])].split(option[i])[1].strip() for i in range(len(option) - 1)] [string.split(option[-1])[1].strip()]
Broken down into a loop, with some explicit intermediate steps for readability:
res = []
for i in range(len(option) - 1):
from_ind = string.find(option[i])
to_ind = string.find(option[i 1])
sub_str = string[from_ind:to_ind]
clean_sub_str = sub_str.split(option[i])[1].strip()
res.append(clean_sub_str)
# Last option add-on
res.append(string.split(option[-1])[1].strip())
print(res)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']
This is not as pretty as using regex, but allows for more flexibility in defining the "options".
CodePudding user response:
You can split your string using regular expression re.split()
which is much more powerful compared to Python strings .split()
adjusting the obtained
result using a list comprehension.
The provided solution does not require to modify the input string before splitting and works also in case the input string comes with overall spread whitespaces as demonstrated below:
import re
s = " [A] Assam[B] Meghalaya [C] West Bengal [D] Odisha "
print([ r.strip() for r in re.split("\[[A-Z]\]", s) if r.strip() ] )
# gives: ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']
The regex pattern
r'\[[A-Z]\]'
splits on'[A]'
up to'[Z]'
r.strip()
removes any white spaces enclosing the resultsif r.strip()
removes empty strings and strings containing only spaces from the result of splittingThe backslash in
'\['
and'\]'
is necessary as square brackets have a special meaning when using regular expression pattern and must be escaped[A-Z]
represents any of uppercase ASCII letters from A up to Z