Home > Blockchain >  Replacing abbreviations in text using Python without changing white space
Replacing abbreviations in text using Python without changing white space

Time:10-02

I am trying to replace abbreviations in the text using Python without changing the sentence structure including whitespace.

I have created a data dictionary with the abbreviations and the replacers;

replacers = {
'aaa': 'abdominal aortic aneurysm',
'taa' : 'thoracic aortic aneurysm',
'clti': 'chronic limb threatening ischaemia',

I have my text coming from a text area in a form called 'note'.

if request. Method == "POST":
        text = request.POST.get("note")

I have created this function to remove the abbreviations.

# remove abbreviations function
def acronym(replacers, text):
     return ' '.join([replacers.get(i, i) for i in text.split()])

It works well but it removes all new-lines and tabs and makes the text difficult to read. Is there an elegant way to write the above function?

Many thanks.

CodePudding user response:

You can accomplish this in multiple ways> The most straightforward is:

# remove abbreviations function
def acronym(replacers, text):
    for rk in replacers.keys():
        text = text.replace(rk, replacers[rk])
    return text

Using a regular expression helps ensure the "perfect" match:

# remove abbreviations function
def acronym(replacers, text):
    for rk in replacers.keys():
        text = re.sub(r"(^|\s)("  str(rk)   ")($|\s)", r"\1"   str(replacers[rk])   r"\3", text)
    return text

Note: the above regex ensures the keyword has a leading (new line) or (white-space char) and a tailing (end of line) or (white-space char). This can be improved anyway :)

CodePudding user response:

I found out that using regular expression works for this situation.

# remove abbreviations function
def acronym(replacers, text):
    for i, x in replacers.items():
        text = re.sub(rf'\b{i}\b', x, text)
    return text
  • Related