Home > database >  How to match complete words for acronym using regex?
How to match complete words for acronym using regex?

Time:11-28

I want to only get complete words from acronyms with ( ) around them.

For example, there is a sentence 'Lung cancer screening (LCS) reduces NSCLC mortality'; ->I want to get 'Lung cancer screening' as a result.

How can I do it with regex?


original question: I want to remove repeated upper alphabets : "HIV acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer" => " acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer"

CodePudding user response:

Assuming you want to target 2 or more capital letters, I would use re.sub here:

inp = "Lung cancer screening (LCS) reduces NSCLC mortality"
output = re.sub(r'\s*(?:\([A-Z] \)|[A-Z]{2,})\s*', ' ', inp).strip()
print(output)  # Lung cancer screening reduces mortality

CodePudding user response:

import re
s = 'HIV acquired immunodeficiency syndrome are at a particularly high risk of cervical cancer'
print(re.sub(r'([A-Z])', lambda pat:'', s).strip()) # Inline

according to @jensgram answer

  • Related