Replacing abbreviations in text using Python without changing white space-CodePudding

I am trying to replace abbreviations in the text using Python without changing the sentence structure including whitespace.

I have created a data dictionary with the abbreviations and the replacers;

replacers = {
'aaa': 'abdominal aortic aneurysm',
'taa' : 'thoracic aortic aneurysm',
'clti': 'chronic limb threatening ischaemia',

I have my text coming from a text area in a form called 'note'.

if request. Method == "POST":
        text = request.POST.get("note")

I have created this function to remove the abbreviations.

# remove abbreviations function
def acronym(replacers, text):
     return ' '.join([replacers.get(i, i) for i in text.split()])

It works well but it removes all new-lines and tabs and makes the text difficult to read. Is there an elegant way to write the above function?

Many thanks.

CodePudding user response：

You can accomplish this in multiple ways> The most straightforward is:

# remove abbreviations function
def acronym(replacers, text):
    for rk in replacers.keys():
        text = text.replace(rk, replacers[rk])
    return text

Using a regular expression helps ensure the "perfect" match:

# remove abbreviations function
def acronym(replacers, text):
    for rk in replacers.keys():
        text = re.sub(r"(^|\s)("  str(rk)   ")($|\s)", r"\1"   str(replacers[rk])   r"\3", text)
    return text

Note: the above regex ensures the keyword has a leading (new line) or (white-space char) and a tailing (end of line) or (white-space char). This can be improved anyway :)

CodePudding user response：

I found out that using regular expression works for this situation.

# remove abbreviations function
def acronym(replacers, text):
    for i, x in replacers.items():
        text = re.sub(rf'\b{i}\b', x, text)
    return text