I want to remove line from string if doesn't contain any letter and pass if contain letter or numbers. I am try to solve this problem by using RegEx in python, but unable to remove line. example
string='''हिरासत में ली गई महिला 36 वर्षीय नूर सजात कमरुज़्ज़मा थीं
British High Commissioner Greets
पत्ता आंबेडकर चौक, निमशीरगाव,
निमिशरगाव, निरमशिरगाव, कोल्हापूर, NIMSHIRGAON, Nimshirgaon,
Address: ambedkar chowk,
महाराष्ट्र, 416101
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947'''
The output I want
output= '''British High Commissioner Greets
Address: ambedkar chowk,
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947'''
please help me out!!!
CodePudding user response:
You can use a simple comprehension with a regex to match only the lines with ascii characters:
import re
out = '\n'.join(s for s in string.split('\n') if re.match(r'^[\x00-\x7F] $', s))
print(out)
output:
British High Commissioner Greets
Address: ambedkar chowk,
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947
CodePudding user response:
Here you go. Whichever line does not follow the pattern (you can add more characters in the pattern depending on what you want), it will not be appended to the output.
I think this will solve your problem.
import re
pattern = re.compile("[a-zA-Z0-9!@#$&()\\-`. ,/\"] ")
multilinestring = '''हिरासत में ली गई महिला 36 वर्षीय नूर सजात कमरुज़्ज़मा थीं
British High Commissioner Greets
पत्ता आंबेडकर चौक, निमशीरगाव,
निमिशरगाव, निरमशिरगाव, कोल्हापूर, NIMSHIRGAON, Nimshirgaon,
Address: ambedkar chowk,
महाराष्ट्र, 416101
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947'''
split_list = multilinestring.splitlines()
output_list = []
for word in split_list:
if pattern.match(word):
output_list.append(word)
print(*output_list, sep = "\n")
The last line prints each string in a list on a separate line. Here is the output:
British High Commissioner Greets
Address: ambedkar chowk,
Nimshirgaon, Kolhapur, Maharashtra,
416101
1832
1947