I have a string of type
string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
Output = ['Assam', 'Meghalaya','West Bengal','Odhisa']
I tried many ways, but I always end up splitting the substring West Bengal into two halves... I am not able to cover the edge case mentioned above.
What I tried was pass the string into the below function and then split it.. But not working!!!!
def remove_alpha(string):
option = ['[A]', '[B]', '[C]', '[D]']
res = ""
for i in option:
res = string.replace(i, '')
string = res
return res
CodePudding user response:
You can use regex for this:
import re
string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha"
pattern = re.compile(r"] (.*?)(?:\[|$)")
output = pattern.findall(string)
print(output)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']
- How it works: https://regex101.com/r/5peFyC/1
re
module
CodePudding user response:
You can split on regex patterns using re.split
:
import re
string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
print(re.split(r"\s*\[\w\]\s*", string.strip())[1:])
Note that we first eliminate the spaces around the string by strip()
, then we use r"\s*\[\w\]\s*"
to match up options like [A]
with possible spaces. Since the first element of the result is empty, we remove that by slicing [1:]
at the end.
CodePudding user response:
This can be done in a one-line list-comprehension plus a special case of the last option:
[string[string.find(option[i]):string.find(option[i 1])].split(option[i])[1].strip() for i in range(len(option) - 1)] [string.split(option[-1])[1].strip()]
Broken down into a loop, with some explicit intermediate steps for readability:
res = []
for i in range(len(option) - 1):
from_ind = string.find(option[i])
to_ind = string.find(option[i 1])
sub_str = string[from_ind:to_ind]
clean_sub_str = sub_str.split(option[i])[1].strip()
res.append(clean_sub_str)
# Last option add-on
res.append(string.split(option[-1])[1].strip())
print(res)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']
This is not as pretty as using regex, but allows for more flexibility in defining the "options".
CodePudding user response:
Try to use regular expressions to extract what you want:
import re
s = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
print( re.findall('] ([^[]*)', s) )
which gives:
gives ['Assam', 'Meghalaya', 'West Bengal', 'Odisha ']
The regex '] ([^[]*)'
pattern means to search first for a string '] '
If the above string was found it should be then followed by the pattern '([^[]*)'
[^[]
means any character except '['
('^'
means not
)
The *
in [^[]*
means any number of
The brackets in ([^[]*)'
mean that the enclosed pattern should be if found added to the list of found results.