pandas remove all words before a specific word and get the first n words after that specific word-CodePudding

I have a dataframe like this:

df=pd.DataFrame({'caption':'hello this pack is for you: Jake Peralta. Thanks'})
df

caption
hello this pack is for you: Jake Peralta. Thanks
...
...
...

I'm trying to get the recipient's first and last name here. The format of the caption column is always the same. So delete everything before for you: and get the first 2(this number may change) words after for you:

CodePudding user response：

Takes care of leading spaces in name:

>>> df.caption.str.split(".").str[0].str.split(":").str[1].str.strip()

1    Jake Peralta
Name: caption, dtype: object

CodePudding user response：

here is one way :

df.caption.apply(lambda st: st[st.find(":") 2:st.find(".")])

output :

0     Jake Peralta
Name: caption, dtype: object

CodePudding user response：

May be you can try like this

df['caption'].str.split("for you: ").str[1].str.split('.').str[0]

output:

0    Jake Peralta
1      first last