How to split a string into two sub-stings using a string matching in a data frame-CodePudding

I have a dataframe with the sentences column and a column with a word present in the sentences.I want to string match the word to the word in the sentences column and create a data frame by splitting the sentences into two different sentences and placing them into separate columns as mentioned below.

I have df1

Sentence	word
me and John went to the area within 20 minutes	went to
I ran out of the house and jumped to a conclusion	jumped

I want to create df2 as below.

Sentence	word	source	target
me and John went to the area within 20 minutes	went to	me and John	the area within 20 minutes
I ran out of the house and jumped to a conclusion	jumped	I ran out of the house and	to a conclusion

CodePudding user response：

If you look at the str.split method you can specify what to split on there:

data = pd.DataFrame({"sentences":["me and John went to the area within 20 minutes","I ran out of the house and jumped to a conclusion"], 
"words":["went to","jumped"]})


source,target = zip(*[(s.split(t)[0],s.split(t)[-1]) for s,t in zip(data["sentences"],data["words"])])
data["source"] = source
data["target"] = target

CodePudding user response：

you could use regex and create your new dataframe:

df2 = {"Sentence":[], "word":[], "source":[] ,"target":[]}

for d in df.iterrows() :
    source,_ ,target = re.findall(f'(.*) ({d[1].word})(.*)', d[1].Sentence)[0]
    df2['Sentence'].append(d[1].Sentence)
    df2['word'].append(d[1].word)
    df2['source'].append(source)
    df2['target'].append(target)
df2 = pd.DataFrame(df2)

output: