I am trying to replace abbreviations in the text using Python without changing the sentence structure including whitespace.
I have created a data dictionary with the abbreviations and the replacers;
replacers = {
'aaa': 'abdominal aortic aneurysm',
'taa' : 'thoracic aortic aneurysm',
'clti': 'chronic limb threatening ischaemia',
I have my text coming from a text area in a form called 'note'.
if request. Method == "POST":
text = request.POST.get("note")
I have created this function to remove the abbreviations.
# remove abbreviations function
def acronym(replacers, text):
return ' '.join([replacers.get(i, i) for i in text.split()])
It works well but it removes all new-lines and tabs and makes the text difficult to read. Is there an elegant way to write the above function?
Many thanks.
CodePudding user response:
You can accomplish this in multiple ways> The most straightforward is:
# remove abbreviations function
def acronym(replacers, text):
for rk in replacers.keys():
text = text.replace(rk, replacers[rk])
return text
Using a regular expression helps ensure the "perfect" match:
# remove abbreviations function
def acronym(replacers, text):
for rk in replacers.keys():
text = re.sub(r"(^|\s)(" str(rk) ")($|\s)", r"\1" str(replacers[rk]) r"\3", text)
return text
Note: the above regex ensures the keyword has a leading (new line) or (white-space char) and a tailing (end of line) or (white-space char). This can be improved anyway :)
CodePudding user response:
I found out that using regular expression works for this situation.
# remove abbreviations function
def acronym(replacers, text):
for i, x in replacers.items():
text = re.sub(rf'\b{i}\b', x, text)
return text