I have two DataFrames each with these column ['abstract', 'text', 'label']
if a word is in a text column in dataframe1
, replace the word with another and make a new row with new info and add to dataframe2
. Do this for all of row that have target word in them. For example, if there is 'beautiful' in column text:
abstract:'123'
text: 'this is a beautiful day'
label:'good'
Then make the following data and add to other DataFrame:
abstract:'bf'
text: 'this is a bf day'
label:'beautiful '
CodePudding user response:
You can use pandas' vectorized string methods series.str.contains
and series.str.replace
:
import pandas as pd
df1 = pd.DataFrame({'abstract': ['123', 'other', 'more'],
'text': ['this is a beautiful day',
'this is not', 'beautiful too'],
'label': ['good', 'bad', 'good']})
df2 = pd.DataFrame(columns=df1.columns)
target = 'beautiful'
abbrev = 'bf'
new_rows = df1[df1.text.str.contains(target)].copy()
new_rows['abstract'] = abbrev
new_rows['text'] = new_rows.text.str.replace(target, abbrev)
new_rows['label'] = target
df2 = df2.append(new_rows)
df2
abstract text label
0 bf this is a bf day beautiful
2 bf bf too beautiful