Home > Software engineering >  pandas: repeat a row if a column contains certain value
pandas: repeat a row if a column contains certain value

Time:05-03

I have a dataframe as follows,

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']], 'info':['school','door','play']})

I would like to repeat the verbs in text column if the corresponding 'pos' is 'VERB'. so I did the following so far,

df['text'] = df['text'].str.split()
df_new = df.apply(pd.Series.explode)

and then I tried to repeat the specific rows in this manner,

print(df_new.loc[df_new.index.repeat(df_new['pos']=='VERB')].reset_index(drop=True))

but it does not return anything. My desired output would be,

    new_df 
       text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10       go   VERB    play
11       go   VERB    play
12      out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play

CodePudding user response:

If the index is not important you can use:

df2 = (df.assign(text=df['text'].str.split())
         .explode(['text', 'pos'], ignore_index=True)
      )

df_new = (pd.concat([df2, df2[df2['pos'].eq('VERB')]])
            .sort_index().reset_index(drop=True)
          )

alternative using repeat (and df2 from above):

df_new = (df2.loc[df2.index.repeat(df2['pos'].eq('VERB').add(1))]
             .reset_index(drop=True)
          )

output:

      text    pos    info
0        I   PRON  school
1       go   VERB  school
2       go   VERB  school
3       to    ADP  school
4   school   NOUN  school
5     open   VERB    door
6     open   VERB    door
7      the    DET    door
8    green    ADJ    door
9     door   NOUN    door
10      go   VERB    play
11      go   VERB    play
12     out    ADP    play
13     and  CCONJ    play
14    play   VERB    play
15    play   VERB    play
  • Related