Home > OS >  How to remove line if doesn't contain letter in python
How to remove line if doesn't contain letter in python

Time:11-22

I want to remove line from string if doesn't contain any letter and pass if contain letter or numbers. I am try to solve this problem by using RegEx in python, but unable to remove line. example

string='''हिरासत में ली गई महिला 36 वर्षीय नूर सजात कमरुज़्ज़मा थीं
          British High Commissioner Greets
          
          पत्ता आंबेडकर चौक, निमशीरगाव,
          निमिशरगाव, निरमशिरगाव, कोल्हापूर, NIMSHIRGAON, Nimshirgaon,
          Address: ambedkar chowk,
          महाराष्ट्र, 416101
          Nimshirgaon, Kolhapur, Maharashtra,
          416101
          1832
          1947'''

The output I want

output=  '''British High Commissioner Greets
           Address: ambedkar chowk,
           Nimshirgaon, Kolhapur, Maharashtra,
           416101
           1832
           1947'''

please help me out!!!

CodePudding user response:

You can use a simple comprehension with a regex to match only the lines with ascii characters:

import re
out = '\n'.join(s for s in string.split('\n') if re.match(r'^[\x00-\x7F] $', s))
print(out)

output:

          British High Commissioner Greets
          
          Address: ambedkar chowk,
          Nimshirgaon, Kolhapur, Maharashtra,
          416101
          1832
          1947

CodePudding user response:

Here you go. Whichever line does not follow the pattern (you can add more characters in the pattern depending on what you want), it will not be appended to the output.

I think this will solve your problem.

import re

pattern = re.compile("[a-zA-Z0-9!@#$&()\\-`. ,/\"] ")

multilinestring = '''हिरासत में ली गई महिला 36 वर्षीय नूर सजात कमरुज़्ज़मा थीं
British High Commissioner Greets        
पत्ता आंबेडकर चौक, निमशीरगाव,
निमिशरगाव, निरमशिरगाव, कोल्हापूर, NIMSHIRGAON, Nimshirgaon,
Address: ambedkar chowk,
महाराष्ट्र, 416101
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947'''

split_list = multilinestring.splitlines()
output_list = []
for word in split_list:
    if pattern.match(word):
        output_list.append(word)

print(*output_list, sep = "\n")

The last line prints each string in a list on a separate line. Here is the output:

British High Commissioner Greets        
Address: ambedkar chowk,
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947
  • Related