Home > Blockchain >  how to extract words from the sting in the list in python?
how to extract words from the sting in the list in python?

Time:11-07

I have a string of type

string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "

Output = ['Assam', 'Meghalaya','West Bengal','Odhisa']

I tried many ways, but I always end up splitting the substring West Bengal into two halves... I am not able to cover the edge case mentioned above.

What I tried was pass the string into the below function and then split it.. But not working!!!!

def remove_alpha(string):

    option = ['[A]', '[B]', '[C]', '[D]']
    res = ""
    for i in option:
        res = string.replace(i, '')
        string = res
    return res

CodePudding user response:

You can use regex for this:

import re

string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha"
pattern = re.compile(r"] (.*?)(?:\[|$)")

output = pattern.findall(string)
print(output)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']

CodePudding user response:

You can split on regex patterns using re.split:

import re


string = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "

print(re.split(r"\s*\[\w\]\s*", string.strip())[1:])

Note that we first eliminate the spaces around the string by strip(), then we use r"\s*\[\w\]\s*" to match up options like [A] with possible spaces. Since the first element of the result is empty, we remove that by slicing [1:] at the end.

CodePudding user response:

This can be done in a one-line list-comprehension plus a special case of the last option:

[string[string.find(option[i]):string.find(option[i 1])].split(option[i])[1].strip() for i in range(len(option) - 1)]   [string.split(option[-1])[1].strip()]

Broken down into a loop, with some explicit intermediate steps for readability:

res = []
for i in range(len(option) - 1):    
    from_ind = string.find(option[i]) 
    to_ind = string.find(option[i 1])
    sub_str = string[from_ind:to_ind]
    clean_sub_str = sub_str.split(option[i])[1].strip()
    res.append(clean_sub_str)

# Last option add-on
res.append(string.split(option[-1])[1].strip())    
print(res)
# ['Assam', 'Meghalaya', 'West Bengal', 'Odisha']

This is not as pretty as using regex, but allows for more flexibility in defining the "options".

CodePudding user response:

Try to use regular expressions to extract what you want:

import re
s = "[A] Assam[B] Meghalaya[C] West Bengal[D] Odisha "
print( re.findall('] ([^[]*)', s) )

which gives:

gives ['Assam', 'Meghalaya', 'West Bengal', 'Odisha ']

The regex '] ([^[]*)' pattern means to search first for a string '] '

If the above string was found it should be then followed by the pattern '([^[]*)'

[^[] means any character except '[' ('^' means not)

The * in [^[]* means any number of

The brackets in ([^[]*)' mean that the enclosed pattern should be if found added to the list of found results.

  • Related