Home > Software design >  How to filter string using regex?
How to filter string using regex?

Time:12-12

I have a list of strings which I have to filter in python.

list=["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
       "चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
       "पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
       "Gurav, Pune, Maharashtra,",
       "411027",
       "www"]

I want desire output

list=["Address: sr no94/1B/1/2/3",
      "ashatvinayak chal, jay bhavani",
      "411027 nagar, near bus stop, Pimple",
      "Gurav, Pune, Maharashtra,"
      "411027",
      "www"]

My code

regex = re.compile("[^a-zA-Z0-9!@#$&()\\-`. ,/\"] ")
for i in list:
   print(" ".join(regex.sub(' ', i).split()))

My output

Himanshu Address sr no94/1B/1/2/3
, foo boo, , ashatvinayak chal, jay bhavani
, , , 411027 nagar, near bus stop, Pimple
Gurav, Pune, Maharashtra,
411027
www

I want to remove Himansu if it comes between Non English character (eg: पत्ता स नं Himanshu अष्टविनायक).

CodePudding user response:

Try with this code:

import re
list = ["पत्ता स नं Himanshu अष्टविनायक Address: sr no94/1B/1/2/3",
        "चाळ, जय foo boo, बस स्टोप जवळ, ashatvinayak chal, jay bhavani",
        "पिंपळे गुरव, पुणे, महाराष्ट्र, 411027 nagar, near bus stop, Pimple",
        "पिं Gurav, Pune, Maharashtra,",
        "411027",
        "www"]
list2 = []
pattern = "[^a-zA-Z0-9!@\s:#$&()\\-`. ,/\"] [, ]*(?!.*[^a-zA-Z0-9!@\s:#$&()\\-`. ,/\"] [, ]*)"
for i in list:
    st = re.findall(pattern,i)
    if st:
        list2.append(i[i.index(st[0]) len(st[0]):])
    else:
        list2.append(i)
print(list2)

output :
['Address: sr no94/1B/1/2/3', 'ashatvinayak chal, jay bhavani', '411027 nagar, near bus stop, Pimple', 'Gurav, Pune, Maharashtra,', '411027', 'www']

  • Related