I have dataframe:
data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]}
df = pd.DataFrame(data)
I want to use an array from the begin_end
column to extract the words from the text
column into a new column, like text[128:139 1]
. So it will be:
begin_end new_col
0 [128, 139] have visited
1 [20, 31] when he came
CodePudding user response:
You need to use a loop:
df['new_col'] = [s[a:b 1] for s, (a,b) in zip(df['text'], df['begin_end'])]
output:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
CodePudding user response:
You can try this in very easy and simple way
import pandas as pd
data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]})
data
output :
text begin_end
0 They say that all cats land on their feet, but... [128, 139]
1 Mom called dad, and when he came home, he took... [20, 31]
Apply function
def getString(string,location):
if location[0] < location[1]: ##checking condtion #you can apply more conditions
return string[location[0]:location[1] 1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data
output:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
CodePudding user response:
Try this:
df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end'] 1], axis=1)
Output:
text begin_end begin end new_col
0 ... [128, 139] 128 139 have visited
1 ... [20, 31] 20 31 when he came
If begin
and end
are already stored separately, you do not need to extract them from the begin_end
. If it is not necessary, try to avoid storing a list
in a pd.Series()