Detect abbreviations and replace it-CodePudding

I have a list of list in the format

Mylist = [["where is USA?", "How are you in UK?"], 
          ["I need it", "OMG you scared me"]]

I also have a dictionary

Abbr = {"USA": "United States of America", 
        "UK": "United Kingdom", 
        "ASAP": "As soon as possible", 
        "OMG": "Oh my god"}

I need to iterate through list and find where the abbreviations are and need a results like the following

Mylist=[["where is USA?", "where is United States of America?",
         "How are you doing in UK?", "How are you doing in United Kingdom?"],
        ["I need it", "OMG you scared me", "Oh my god you scared me"]]

CodePudding user response：

Here's an inefficient answer, but it works. Given the nature of the data you're working with, I assume it won't be an especially large dataset where nested loops would be an issue.

Mylist = [["where is USA?", "How are you in UK?"], 
          ["I need it", "OMG you scared me"]]
Abbr = {"USA": "United States of America", 
        "UK": "United Kingdom", 
        "ASAP": "As soon as possible", 
        "OMG": "Oh my god"}

for index, list_level1 in enumerate(Mylist):
    for list_level2 in list_level1:
        for key in Abbr.keys():
            if key in list_level2:
                Mylist[index].append(list_level2.replace(key, Abbr[key]))

This essentially loops over every item in your list, checks to see if the list item contains something in your dictionary, and if it does, appends it to your existing list.

CodePudding user response：

You can use flashtext and replace keywords much more efficiently.

from flashtext import KeywordProcessor

# Create a KeywordProcessor object
kwp = KeywordProcessor()
# kwp = KeywordProcessor(case_sensitive=True) # if you want it to be case sensitive
# Add the abbreviations and their expanded forms to the keyword processor
for key, value in Abbr.items():
    kwp.add_keyword(key, value)

# Loop through the list of text and use the keyword processor to
# perform replacements on each item and append it back
updated_mylist = []
for items in Mylist:
    updated_items = []
    for item in items:
        # replace the item based on the abbr given
        expanded_item = kwp.replace_keywords(item)
        updated_items.extend(
           [item] if expanded_item == item else (item, expanded_item)
        )
    updated_mylist.append(updated_items)

# Print the updated list
print(updated_mylist)


[['where is USA?', 'where is United States of America?',
  'How are you in UK?', 'How are you in United Kingdom?'], 
 ['I need it', 'OMG you scared me', 'Oh my god you scared me']]