I have a dataframe with the sentences column and a column with a word present in the sentences.I want to string match the word to the word in the sentences column and create a data frame by splitting the sentences into two different sentences and placing them into separate columns as mentioned below.
I have df1
Sentence | word |
---|---|
me and John went to the area within 20 minutes | went to |
I ran out of the house and jumped to a conclusion | jumped |
I want to create df2 as below.
Sentence | word | source | target |
---|---|---|---|
me and John went to the area within 20 minutes | went to | me and John | the area within 20 minutes |
I ran out of the house and jumped to a conclusion | jumped | I ran out of the house and | to a conclusion |
CodePudding user response:
If you look at the str.split
method you can specify what to split on there:
data = pd.DataFrame({"sentences":["me and John went to the area within 20 minutes","I ran out of the house and jumped to a conclusion"],
"words":["went to","jumped"]})
source,target = zip(*[(s.split(t)[0],s.split(t)[-1]) for s,t in zip(data["sentences"],data["words"])])
data["source"] = source
data["target"] = target
CodePudding user response:
you could use regex and create your new dataframe:
df2 = {"Sentence":[], "word":[], "source":[] ,"target":[]}
for d in df.iterrows() :
source,_ ,target = re.findall(f'(.*) ({d[1].word})(.*)', d[1].Sentence)[0]
df2['Sentence'].append(d[1].Sentence)
df2['word'].append(d[1].word)
df2['source'].append(source)
df2['target'].append(target)
df2 = pd.DataFrame(df2)