I'm working on a dataset that has text and I want to extract a name in the text. So its tweet_id, text columns and I want to extract name from the tweet text.
text.startswith('This is ') and re.match(r'[A-Z].*', text.split()[2]):
new_names.append(text.split()[2].strip(',').strip('.'))
This is what I used to extract the name after "this is".
I want to extract the name that might be in the middle of the text such as after the words, "name is" and "named", how do i go about doing that?
CodePudding user response:
If I understand you, this is the solution:
import re
s = "This is Shahab .... my name is Shahab .... he is named Gholam"
names_regex = re.compile(r"[T|t]his\sis\s(\w )|named\s(\w )|name\sis\s(\w )")
names = names_regex.findall(s)
print(names)
Output:
[('Shahab', '', ''), ('', '', 'Shahab'), ('', 'Gholam', '')]
CodePudding user response:
import re
text = "this pooch's name is Pepper. She's a sweet lovable monster. Although she has a lot of good qualities she also pees in the house and won't stop killing birds! Because of that we gave her an 8/10. Not bad."
m = re.search('(?<=name is\s)[A-Za-z]*', text, flags=re.IGNORECASE)
name = m.group(0)
print(name)