Home > database >  Pandas add noise to specific column
Pandas add noise to specific column

Time:09-07

This is my table below, I want to use pandas to add some noise to x column only, my current code does not work.

x     y
-------
30    0
1     1
0     1
300   0
....

I want only add noise to y==1 noise = np.random.normal(50, 10, ???)

result

x(float)            y
----------------
30          0
1 noise     1
0 noise     1
300         0
....

CodePudding user response:

Use DataFrame.loc with count nuber of values by sum:

m = df.y==1 
df.loc[m, 'x']  = np.random.normal(50, 10, m.sum())
print (df)
            x  y
0   30.000000  0
1   52.623817  1
2   56.042890  1
3  300.000000  0

Or generate array same length like length of DataFrame - then use Series.mask or numpy.where:

df['x'] = df['x'].mask(df.y==1, np.random.normal(50, 10, len(df)))
#alternative
#df['x'] = np.where(df.y==1, np.random.normal(50, 10, len(df)), df['x'])
print (df)
            x  y
0   30.000000  0
1   37.968245  1
2   46.963821  1
3  300.000000  0
  • Related