how to extract words from the string in the list in python?-CodePudding

I have a string of type

string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "

Output = ['Assam', 'Meghalaya','West Bengal','Odhisa']

I tried many ways, but I always end up splitting the substring West Bengal into two halves... I am not able to cover the edge case mentioned above.

What I tried was pass the string into the below function and then split it.. But not working!!!!

def remove_alpha(string):

    option = ['[A]', '[B]', '[C]', '[D]']
    res = ""
    for i in option:
        res = string.replace(i, '')
        string = res
    return res

CodePudding user response：

You can use regex for this:

import re

string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
pattern = re.compile(r"] (.*?)(?:\[|$)")

output = pattern.findall(string.strip())
print(output)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']

How it works: https://regex101.com/r/5peFyC/1
re module

CodePudding user response：

You can split on regex patterns using re.split:

import re


string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "

print(re.split(r"\s*\[\w\]\s*", string.strip())[1:])

Note that we first eliminate the spaces around the string by strip(), then we use r"\s*\[\w\]\s*" to match up options like [A] with possible spaces. Since the first element of the result is empty, we remove that by slicing [1:] at the end.

CodePudding user response：

This can be done in a one-line list-comprehension plus a special case of the last option:

[string[string.find(option[i]):string.find(option[i 1])].split(option[i])[1].strip() for i in range(len(option) - 1)]   [string.split(option[-1])[1].strip()]

Broken down into a loop, with some explicit intermediate steps for readability:

res = []
for i in range(len(option) - 1):    
    from_ind = string.find(option[i]) 
    to_ind = string.find(option[i 1])
    sub_str = string[from_ind:to_ind]
    clean_sub_str = sub_str.split(option[i])[1].strip()
    res.append(clean_sub_str)

# Last option add-on
res.append(string.split(option[-1])[1].strip())    
print(res)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']

This is not as pretty as using regex, but allows for more flexibility in defining the "options".

CodePudding user response：

You can split your string using regular expression re.split() which is much more powerful compared to Python strings .split() adjusting the obtained result using a list comprehension.

The provided solution does not require to modify the input string before splitting and works also in case the input string comes with overall spread whitespaces as demonstrated below:

import re
s = "  [A] Assam[B] Meghalaya [C] West Bengal [D] Odisha  "
print([ r.strip() for r in re.split("\[[A-Z]\]", s) if r.strip() ] )
# gives: ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']

The regex pattern r'\[[A-Z]\]' splits on '[A]' up to '[Z]'
r.strip() removes any white spaces enclosing the results
if r.strip() removes empty strings and strings containing only spaces from the result of splitting
The backslash in '\[' and '\]' is necessary as square brackets have a special meaning when using regular expression pattern and must be escaped
[A-Z] represents any of uppercase ASCII letters from A up to Z