Home > database >  Replace str value in pd df by sampling from a pandas array
Replace str value in pd df by sampling from a pandas array

Time:12-15

I have a pandas df

df = pd.DataFrame({'A': [0.1, 0.1, 0.1, 0.1, 'X'], 'B': [0.1, 0.1, 'X', 0.1, 0.1], 'C': [0.1, 'X', 'X', 'X', 'X']})

 A    B    C
 0.1  0.1  0.1
 0.1  0.1   X
 0.1   X    X
 0.1  0.1   X
  X   0.1   X

and an array

<PandasArray> [0.9999999999999304, 0.9999973764241584, 0.9999997377248664, 0.9615117313882438, 0.871479832883895, 0.9999999999998652, 0.9999999999999994, 0.9999029359407972, 0.999999984174712, 0.9944689702907784] Length: 10, dtype: float64

I would like to replace the values X by sampling from the array such that the distribution of the values in the array is represented in the df in the locations with the value X

I have tried

df[df == 'X'] = np.random.choice(arr, replace=True)

which gives this output

 A    B    C
 0.1  0.1  0.1
 0.1  0.1  1.0
 0.1  1.0  1.0
 0.1  0.1  1.0
 1.0  0.1  1.0

Does this randomly sample from the array and why are the values rounded? I would like to replace with the exact values from the array.

CodePudding user response:

Does this randomly sample from the array?

Yes, you are right.

Why are the values rounded?

It is display problem, if convert to list get real data:

df[df == 'X'] = np.random.choice(arr, replace=True)
print (df.to_dict('list'))

{'A': [0.1, 0.1, 0.1, 0.1, 0.9999997377248664],
 'B': [0.1, 0.1, 0.9999997377248664, 0.1, 0.1], 
 'C': [0.1, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664]}
  • Related