I have a Pandas series
Explanation
a "how are you doing today where is she going"
b "do you like blueberry ice cream does not make sure "
c "this works but you know that the translation is on"
I want to extract the 2 words before and after the string "you"
for example, I want it to be something like
Explanation Explanation Extracted
a "how are you doing today where is she going" "how are you doing today"
b "do you like blueberry ice cream does not make sure " do you like blueberry ice
c "this works but you know that the translation is on" "work but you know that"
This regex expression gives me the the two words before and after "you", but doesn't include "you" itself
(?P<before>(?:\w \W ){,2})you\W (?P<after>(?:\w \W ){,2})
How do I change it so I can have "you" included
CodePudding user response:
You can use
df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w \W ){0,2}you\b(?:\W \w ){0,2})', expand=False)
See the regex demo.
Details:
\b
- a word boundary(?:\w \W ){0,2}
- zero, one or two occurrences of one or more word chars and then one or more non-word charsyou
- ayou
string\b
- a word boundary(?:\W \w ){0,2}
- zero, one or two occurrences of one or more non-word chars and then one or more word chars.
A Pandas test:
>>> import pandas as pd
>>> df = pd.DataFrame({'Explanation':["how are you doing today where is she going", "do you like blueberry ice cream does not make sure ", "this works but you know that the translation is on"]})
>>> df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w \W ){0,2}you\b(?:\W \w ){0,2})', expand=False)
>>> df
Explanation Explanation Extracted
0 how are you doing today where is she going how are you doing today
1 do you like blueberry ice cream does not make ... do you like blueberry
2 this works but you know that the translation i... works but you know that
CodePudding user response:
I will show a way with no regex and no pandas, for this case I dont see it needed.
text1 = "how are you doing today where is she going"
text2 = "do you like blueberry ice cream does not make sure "
text3 = "this works but you know that the translation is on"
def show_trunc_sentence(text, word='you'): # here you can choose another word besides you but you is the default
word_loc = int(text.split().index('you'))
num = [word_loc - 2 if word_loc - 2 >= 0 else 0]
num = int(num[0])
before = text.split()[num: word_loc 1]
after = text.split()[word_loc 1:word_loc 3]
print(" ".join(before after))
show_trunc_sentence(text2)
Outputs : text1 - how are you doing today text2 - do you like blueberry text3 - works but you know that