I have data frame that contain column class, in class column there is 3 text vales 'positive', 'negative' and 'neutral'. I want to change 40% of the neutral to be positive and 30% of the neutral to be negative and leave the remaining 30% neutral in the data frame using pandas python.
CodePudding user response:
Setting up an example:
np.random.seed(0)
df = pd.DataFrame({'col': np.random.choice(['positive', 'negative', 'neutral'], 1000)})
# col
# 0 positive
# 1 negative
# 2 positive
# 3 negative
# 4 negative
df.value_counts(normalize=True)
# positive 0.337
# negative 0.335
# neutral 0.328
Then we can get the indices of the neutral, shuffle them and split:
# get shuffled index of neutral
idx = df[df['col'].eq('neutral')].sample(frac=1).index
L = len(idx)
# replace first random 40%
df.loc[idx[:int(L*0.4)], 'col'] = 'positive'
# replace next random 30%
df.loc[idx[int(L*0.4):int(L*0.7)], 'col'] = 'negative'
value counts (as fraction):
>>> df.value_counts(normalize=True)
positive 0.468
negative 0.433
neutral 0.099