Home > front end >  How do i extract only abbreviation following acronyms inside the brackets by mapping each Capital le
How do i extract only abbreviation following acronyms inside the brackets by mapping each Capital le

Time:12-25

 a = "The process maps are similar to Manual Excellence Process Framework (MEPF)"

input = "The process maps are similar to Manual Excellence Process Framework (MEPF)"

output = Manual Excellence Process Framework (MEPF)

I want to write a python scripts where I have that piece of text, from that I want to extract full for of given acronyms inside the brackets (MEPF) and full form is Manual Excellence Process Framework I want to append only full from by match each uppercase letter from inside the brackets.

my idea was when ever acronyms appears inside the bracket that will map each capital letter for example (MEPF) starting from last Letter F that will match last word befoure the bracket here it is Framwork, then P (Pocess) then E(Excellence ) finaly M (manual) so final output will be full form(Manual Excellence Process Framework) can you try once this way that will be realy helpfull for me

CodePudding user response:

Using a simple regex and a bit of post-processing:

a = "I like International Business Machines (IBM). The Manual Excellence Process Framework (MEPF)"

import re
m = re.findall(r'([^)] ) \(([A-Z] )\)', a)
out = {b: ' '.join(a.split()[-len(b):]) for a,b in m}

out

output:

{'IBM': 'International Business Machines',
 'MEPF': 'Manual Excellence Process Framework'}

If you want to check the the acronym actually matches the words:

out = {b: ' '.join(a.split()[-len(b):]) for a,b in m
       if all(x[0]==y for x,y in zip(a.split()[-len(b):], b))
       }

example

a = "No match (ABC). I like International Business Machines (IBM). The Manual Excellence Process Framework (MEPF)."

m = re.findall(r'([^)] ) \(([A-Z] )\)', a)
{b: ' '.join(a.split()[-len(b):]) for a,b in m
 if all(x[0]==y for x,y in zip(a.split()[-len(b):], b))
}

# {'IBM': 'International Business Machines',
#  'MEPF': 'Manual Excellence Process Framework'}

CodePudding user response:

Assuming the acronym in parentheses would always follow the full name, you could use re.findall as follows:

a = "I like International Business Machines (IBM) and also Manual Excellence Process Framework (MEPF)"
matches = re.findall(r'[A-Z][a-z]*(?: [A-Z][a-z]*)* \([A-Z] \)', a)
print(matches)

This prints:

['International Business Machines (IBM)',
 'Manual Excellence Process Framework (MEPF)']

CodePudding user response:

i make your question as a challenge i am a beginner so i hope to be this answer useful for you and thank you for you question :

a = "process maps are similar to Manual Excellence Process 
Framework (MEPF)"

full = ''
ind = a.index('(')
ind2 = a.index(')')
acr = a[ind 1:ind2]
for i in a.split():
    for j in range (len(acr)):
        if acr[j] == i[0] and len(i) > 1:
            word = i
            full = full    word   ' '
print(full)
  • Related