Home > Software engineering >  extracting the 2 words before, the actual word, and the 2 strings after a specific string in python?
extracting the 2 words before, the actual word, and the 2 strings after a specific string in python?

Time:06-30

I have a Pandas series

       Explanation 

a      "how are you doing today where is she going" 
b      "do you like blueberry ice cream does not make sure " 
c      "this works but you know that the translation is on" 

I want to extract the 2 words before and after the string "you"

for example, I want it to be something like

        Explanation                                                    Explanation Extracted

a      "how are you doing today where is she going"                  "how are you doing today"
b      "do you like blueberry ice cream does not make sure "         do you like blueberry ice 
c      "this works but you know that the translation is on"           "work but you know that"

This regex expression gives me the the two words before and after "you", but doesn't include "you" itself

(?P<before>(?:\w \W ){,2})you\W (?P<after>(?:\w \W ){,2})

How do I change it so I can have "you" included

CodePudding user response:

You can use

df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w \W ){0,2}you\b(?:\W \w ){0,2})', expand=False)

See the regex demo.

Details:

  • \b - a word boundary
  • (?:\w \W ){0,2} - zero, one or two occurrences of one or more word chars and then one or more non-word chars
  • you - a you string
  • \b - a word boundary
  • (?:\W \w ){0,2} - zero, one or two occurrences of one or more non-word chars and then one or more word chars.

A Pandas test:

>>> import pandas as pd
>>> df = pd.DataFrame({'Explanation':["how are you doing today where is she going", "do you like blueberry ice cream does not make sure ", "this works but you know that the translation is on"]})
>>> df['Explanation Extracted'] = df['Explanation'].str.extract(r'\b((?:\w \W ){0,2}you\b(?:\W \w ){0,2})', expand=False)
>>> df
                                         Explanation    Explanation Extracted
0         how are you doing today where is she going  how are you doing today
1  do you like blueberry ice cream does not make ...    do you like blueberry
2  this works but you know that the translation i...  works but you know that

CodePudding user response:

I will show a way with no regex and no pandas, for this case I dont see it needed.

text1 = "how are you doing today where is she going"
text2 = "do you like blueberry ice cream does not make sure "
text3 = "this works but you know that the translation is on"


def show_trunc_sentence(text, word='you'): # here you can choose another word besides you but you is the default
    word_loc = int(text.split().index('you'))
    num = [word_loc - 2 if word_loc - 2 >= 0 else 0]
    num = int(num[0])
    before = text.split()[num: word_loc   1]
    after = text.split()[word_loc   1:word_loc   3]
    print(" ".join(before   after))


    show_trunc_sentence(text2)

Outputs : text1 - how are you doing today text2 - do you like blueberry text3 - works but you know that

  • Related