I have a dataframe as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})
I would like to compare row wise and if two same-indexed rows have the same values, replace the duplicate in the 'label' column with the word 'same'.
Desired output:
pos label
0 she is good same
1 she is bad she is good
so far, i have tried the following, but it returns an error:
ValueError: Length of values (1) does not match length of index (2)
df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )
CodePudding user response:
Your syntax is not correct, have a look at the documentation of numpy.where
.
Check for equality between your two columns, and replace the values in your label column:
import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])
prints:
text label
0 she is good same
1 she is bad she is good