I want to only get complete words from acronyms with ( ) around them.
For example, there is a sentence 'Lung cancer screening (LCS) reduces NSCLC mortality'; ->I want to get 'Lung cancer screening' as a result.
How can I do it with regex?
original question: I want to remove repeated upper alphabets : "HIV acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer" => " acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer"
CodePudding user response:
Assuming you want to target 2 or more capital letters, I would use re.sub
here:
inp = "Lung cancer screening (LCS) reduces NSCLC mortality"
output = re.sub(r'\s*(?:\([A-Z] \)|[A-Z]{2,})\s*', ' ', inp).strip()
print(output) # Lung cancer screening reduces mortality
CodePudding user response:
import re
s = 'HIV acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer'
print(re.sub(r'([A-Z])', lambda pat:'', s).strip()) # Inline