Home > Enterprise >  How to replace a word in a string, based on a condition?
How to replace a word in a string, based on a condition?

Time:07-02

I have a column in a dataframe like this.

Text
"Lorum Ipsum Rotterdam dolor sit." 
"ed ut perspiciatis Boekarest, New York, consectetur adipiscing elit, sed " 
"Excepteur sint occaecat Glasgow cupidatat non proident, sunt in culpa"

I want every geographical location to be replaced by "GPE".

I am using spacy to detect the entities. This works fine, as shown below.

nlp = spacy.load('en_core_web_lg')

for value in df['text']:
    doc = nlp(value)
    for ent in doc.ents:
        print(ent.text, ent.label_)
Output: 
Rotterdam GPE
Boekarest GPE
New York GPE
Glasgow GPE 

I tried the code below in order to replace the city names within the columns, but it doesn't work.

for value in df['text']:
    doc = nlp(value)
    for ent in doc.ents:
        for word in value.split():
            if ent.label_ == "GPE":
                word.replace(ent.label, "_GPE_")

Does anyone see what I am doing wrong?

CodePudding user response:

You can use

import spacy, warnings
import pandas as pd
warnings.filterwarnings("ignore", 'User provided device_type of \'cuda\', but CUDA is not available. Disabling')

df = pd.DataFrame({'Text':["Lorum Ipsum Rotterdam dolor sit.", "ed ut perspiciatis Boekarest, New York, consectetur adipiscing elit, sed ", "Excepteur sint occaecat Glasgow cupidatat non proident, sunt in culpa"]})
nlp = spacy.load('en_core_web_lg')

def redact_gpe(text):
    doc = nlp(text)
    newString = text
    for e in reversed(doc.ents):
        if e.label_ == "GPE":
            start = e.start_char
            end = start   len(e.text)
            newString = f'{newString[:start]}GPE{newString[end:]}'
    return newString

df['Text'] = df['Text'].apply(redact_gpe)

Output:

                                                                   Text
0                                      Lorum Ipsum GPE dolor sit.
1  ed ut perspiciatis GPE, GPE, consectetur adipiscing elit, sed
2     Excepteur sint occaecat GPE cupidatat non proident, sunt in culpa
  • Related