randomly change value of a fraction of a single group to values of other groups-CodePudding

I have data frame that contain column class, in class column there is 3 text vales 'positive', 'negative' and 'neutral'. I want to change 40% of the neutral to be positive and 30% of the neutral to be negative and leave the remaining 30% neutral in the data frame using pandas python.

CodePudding user response：

Setting up an example:

np.random.seed(0)
df = pd.DataFrame({'col': np.random.choice(['positive', 'negative', 'neutral'], 1000)})

#         col
# 0  positive
# 1  negative
# 2  positive
# 3  negative
# 4  negative

df.value_counts(normalize=True)
# positive    0.337
# negative    0.335
# neutral     0.328

Then we can get the indices of the neutral, shuffle them and split:

# get shuffled index of neutral
idx = df[df['col'].eq('neutral')].sample(frac=1).index
L = len(idx)

# replace first random 40%
df.loc[idx[:int(L*0.4)], 'col'] = 'positive'
# replace next random 30%
df.loc[idx[int(L*0.4):int(L*0.7)], 'col'] = 'negative'

value counts (as fraction):

>>> df.value_counts(normalize=True)
positive    0.468
negative    0.433
neutral     0.099