Home > Blockchain >  How to replace compound words in a string using a dictionary?
How to replace compound words in a string using a dictionary?

Time:07-09

I have a dictionary whose key:value pairs correspond to compound words and the expression i want to replace them for in a text. For example let's say:

terms_dict = {'digi conso': 'digi conso', 'digi': 'digi conso', 'digiconso': 'digi conso', '3xcb': '3xcb', '3x cb': '3xcb', 'legal entity identifier': 'legal entity identifier'}

My goal is to create a function replace_terms(text, dict) that takes a text and a dictionary like this one as parameters, and returns the text after replacing the compound words.

For instance, this script:

test_text = "i want a digi conso loan for digiconso" 

print(replace_terms(test_text, terms_dict))

Should return:

"i want a digi conso loan for digi conso"

I have tried using .replace() but for some reasons it doesn't work properly, probably because the terms to replace are composed of multiple words.

I also tried this:

def replace_terms(text, terms_dict):
    if len(terms_dict) > 0:
        words_in = [k for k in terms_dict.keys() if k in text]  # ex: words_in = [digi conso, digi, digiconso]
        if len(words_in) > 0:
            for w in words_in:
                pattern = r"\b"   w   r"\b"
                text = re.sub(pattern, terms_dict[w], text)

    return text

But when applied to my text, this function returns: "i want a digi conso conso loan for digi conso", the word conso get's doubled and I can see why (because the words_in list is created by going through the dictionary keys, and the text is not altered when one key is appended to the list).

Is there an efficient way to do this?

Thanks a lot!

CodePudding user response:

A rather quick and wonky way of doing this:

def replace_terms(text, terms):
    replacement_list = []
    check = True
    for term in terms:
        if term in text:
            for r in replacement_list:
                if r[0] == text.index(term):
                    if len(term) > len(r[1]):
                        replacement_list.remove(r)
                    else:
                        check = False
            if check:
                replacement_list.append([text.index(term), term])
            else:
                check = True
    for r in replacement_list:
        text = text.replace(r[1], terms[r[1]])
    return text

Usage:

terms_dict = {
    "digi conso": "digi conso",
    "digi": "digi conso",
    "digiconso": "digi conso",
    "3xcb": "3xcb",
    "3x cb": "3xcb",
    "legal entity identifier": "legal entity identifier"
}

test_text = "i want a digi conso loan for digiconso"

print(replace_terms(test_text, terms_dict))

Result:

i want a digi conso loan for digi conso

CodePudding user response:

This should do it.


terms_dict = { 'digiconso': 'digi conso', '3xcb': '3xcb', '3x cb': '3xcb', 'legal entity identifier': 'legal entity identifier'}
test_text = "i want a digi conso loan for digiconso" 
def replace_terms(txt, dct):
    dct = tuple(dct.items())
    for x, y in dct:
        txt = txt.replace(x, y, 1)
    return txt
print(replace_terms(test_text, terms_dict))

First I get the dict pairs and get them in a easier form(tuple). Then I iter and replace!

Output:

i want a digi conso loan for digi conso

You had to many extra replace identifiers which you did not need. I also made it only replace 1 but you can change that.

  • Related