I have list of entities and a text.Like this:
List=["Data Scientist", "Bihar", "Krishna"]
Text=" I am Krishna. I am from Bihar. I want to be a Data Scientist"
I want results like:
"I am [Entity]Krishna[Entity]. I am from [Entity]Bihar[Entity] . I want to be a [Entity]Data Scientist[Entity]"
Please help me with code in python to get this result.
CodePudding user response:
You could use re.sub() with a pattern built from your list of keywords:
import re
entities = ["Data Scientist", "Bihar", "Krishna"]
pattern = r"\b(" "|".join(map(re.escape,sorted(entities,key=len)[::-1])) r")\b"
test = " I am Krishna. I am from Bihar. I want to be a Data Scientist"
result = re.sub(pattern,r"[entity]\1[entity]",test)
print(result)
I am [entity]Krishna[entity]. I am from [entity]Bihar[entity]. I want to be a [entity]Data Scientist[entity]
The search pattern is build by combining the keywords with the pipe operator (|
) and enclosing that in a capture group for word boundaries:
'\b(Data Scientist|Krishna|Bihar)\b'
The longer keywords are placed first because the pipe operator is not greedy and, if you have keywords that are prefixes of longer keywords, you'll want the longer keyword to take precedence.
CodePudding user response:
Ok, sorry for misreading your original intent. You can do this fairly easily by looking for each list item in your string and replacing it, like so:
List=["Data Scientist", "Bihar", "Krishna"]
text = ' I am Krishna. I am from Bihar. I want to be a Data Scientist'
for entity in List:
if entity in List:
text = text.replace(entity,'[Entity]' entity '[Entity]')
print(text)
Output:
I am [Entity]Krishna[Entity]. I am from [Entity]Bihar[Entity]. I want to be a [Entity]Data Scientist[Entity]
If you're trying to be xml-like, the closing tag should have a slash - [/Entity]
CodePudding user response:
I think you want something like:
List=["Data Scientist", "Bihar", "Krishna"]
print(f"I am {List[2]}. I am from {List[1]}. I want to be a {List[0]}")
As a general point, List
is not a recommended name for a variable in python, as it is so close to the word list
which has a particular meaning.
You might also consider a dictionary for this style of data where you are querying your object for specific, unordered values.
CodePudding user response:
Using f-string is the way to go in your situation. This should do.
List=["Data Scientist", "Bihar", "Krishna"]
text=f'I am {List[0]}. I am from {List[1]}'
print(text)