I want to split news based on sentence and I encounter this problem. How do I replace all dot except the one between direct sentence using iteration or regex? Thanks
Input
Anna said "I want to eat. But I am not hungry.". She is a weird person.
Output
Anna said "I want to eat. But I am not hungry." \n She is a weird person \n
CodePudding user response:
One trick we can try uses re.sub
with the regex pattern ".*?"|\.\s*
. This will first match a quoted term, and afterwards will only match dot which is not quoted. Then, we use a lambda expression in the callback we only replace the dot match with newline.
inp = 'Anna said "I want to eat. But I am not hungry.". She is a weird person.'
output = re.sub(r'".*?"|\.\s*', lambda m: '\n' if m.group().strip() == '.' else m.group(), inp)
print(output)
This prints:
Anna said "I want to eat. But I am not hungry."
She is a weird person
CodePudding user response:
strings = 'Anna said "I want to eat. But I am not hungry.". She is a weird person.'
output = ""
for i, string in enumerate(strings.split("\"")):
if i%2 == 0:
string = string.replace('.', '\n')
else:
string = "\"" string "\""
output = output string
print(output)
This gives:
Anna said "I want to eat. But I am not hungry."
She is a weird person