If a word is in a column in dataframe, replace the word with another and make a new row with new inf-CodePudding

I have two DataFrames each with these column ['abstract', 'text', 'label'] if a word is in a text column in dataframe1, replace the word with another and make a new row with new info and add to dataframe2. Do this for all of row that have target word in them. For example, if there is 'beautiful' in column text:

abstract:'123'
text: 'this is a beautiful day'
label:'good'

Then make the following data and add to other DataFrame:

abstract:'bf'
text: 'this is a bf day'
label:'beautiful '

CodePudding user response：

You can use pandas' vectorized string methods series.str.contains and series.str.replace:

import pandas as pd

df1 = pd.DataFrame({'abstract': ['123', 'other', 'more'],
                    'text': ['this is a beautiful day', 
                             'this is not', 'beautiful too'],
                    'label': ['good', 'bad', 'good']})
df2 = pd.DataFrame(columns=df1.columns)

target = 'beautiful'
abbrev = 'bf'

new_rows = df1[df1.text.str.contains(target)].copy()
new_rows['abstract'] = abbrev
new_rows['text'] = new_rows.text.str.replace(target, abbrev)
new_rows['label'] = target

df2 = df2.append(new_rows)
df2

    abstract    text                label
0   bf          this is a bf day    beautiful
2   bf          bf too              beautiful