I have a list of strings which I have to filter in python.
list=["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
"चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
"पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
"Gurav, Pune, Maharashtra,",
"411027",
"www"]
I want desire output
list=["Address: sr no94/1B/1/2/3",
"ashatvinayak chal, jay bhavani",
"411027 nagar, near bus stop, Pimple",
"Gurav, Pune, Maharashtra,"
"411027",
"www"]
My code
regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`. ,/\"] ")
for i in list:
print(" ".join(regex.sub(' ', i).split()))
My output
Himanshu Address sr no94/1B/1/2/3
, foo boo, , ashatvinayak chal, jay bhavani
, , , 411027 nagar, near bus stop, Pimple
Gurav, Pune, Maharashtra,
411027
www
I want to remove Himansu if it comes between Non English character (eg: पत्ता स नं Himanshu अष्टविनायक).
CodePudding user response:
Try with this code:
import re
list = ["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
"चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
"पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
"पिं Gurav, Pune, Maharashtra,",
"411027",
"www"]
list2 = []
pattern = "[^a-zA-Z0-9!@\s:#$&()\\-`. ,/\"] [, ]*(?!.*[^a-zA-Z0-9!@\s:#$&()\\-`. ,/\"] [, ]*)"
for i in list:
st = re.findall(pattern,i)
if st:
list2.append(i[i.index(st[0]) len(st[0]):])
else:
list2.append(i)
print(list2)
output :
['Address: sr no94/1B/1/2/3', 'ashatvinayak chal, jay bhavani', '411027 nagar, near bus stop, Pimple', 'Gurav, Pune, Maharashtra,', '411027', 'www']