I have a dataframe as follows,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})
I would like to repeat the verbs in text column if the corresponding 'pos' is 'VERB'. so I did the following so far,
df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)
and then I tried to repeat the specific rows in this manner,
print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))
but it does not return anything. My desired output would be,
new_df
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play
CodePudding user response:
If the index is not important you can use:
df2 = (df.assign(text=df['text'].str.split())
.explode(['text', 'pos'], ignore_index=True)
)
df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
.sort_index().reset_index(drop=True)
)
alternative using repeat
(and df2
from above):
df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
.reset_index(drop=True)
)
output:
text pos info
0 I PRON school
1 go VERB school
2 go VERB school
3 to ADP school
4 school NOUN school
5 open VERB door
6 open VERB door
7 the DET door
8 green ADJ door
9 door NOUN door
10 go VERB play
11 go VERB play
12 out ADP play
13 and CCONJ play
14 play VERB play
15 play VERB play