This is my table below, I want to use pandas to add some noise to x
column only, my current code does not work.
x y
-------
30 0
1 1
0 1
300 0
....
I want only add noise to y==1
noise = np.random.normal(50, 10, ???)
result
x(float) y
----------------
30 0
1 noise 1
0 noise 1
300 0
....
CodePudding user response:
Use DataFrame.loc
with count nuber of values by sum
:
m = df.y==1
df.loc[m, 'x'] = np.random.normal(50, 10, m.sum())
print (df)
x y
0 30.000000 0
1 52.623817 1
2 56.042890 1
3 300.000000 0
Or generate array same length like length of DataFrame - then use Series.mask
or numpy.where
:
df['x'] = df['x'].mask(df.y==1, np.random.normal(50, 10, len(df)))
#alternative
#df['x'] = np.where(df.y==1, np.random.normal(50, 10, len(df)), df['x'])
print (df)
x y
0 30.000000 0
1 37.968245 1
2 46.963821 1
3 300.000000 0