I have a pandas df
df = pd.DataFrame({'A': [0.1, 0.1, 0.1, 0.1, 'X'], 'B': [0.1, 0.1, 'X', 0.1, 0.1], 'C': [0.1, 'X', 'X', 'X', 'X']})
A B C
0.1 0.1 0.1
0.1 0.1 X
0.1 X X
0.1 0.1 X
X 0.1 X
and an array
<PandasArray> [0.9999999999999304, 0.9999973764241584, 0.9999997377248664, 0.9615117313882438, 0.871479832883895, 0.9999999999998652, 0.9999999999999994, 0.9999029359407972, 0.999999984174712, 0.9944689702907784] Length: 10, dtype: float64
I would like to replace the values X by sampling from the array such that the distribution of the values in the array is represented in the df in the locations with the value X
I have tried
df[df == 'X'] = np.random.choice(arr, replace=True)
which gives this output
A B C
0.1 0.1 0.1
0.1 0.1 1.0
0.1 1.0 1.0
0.1 0.1 1.0
1.0 0.1 1.0
Does this randomly sample from the array and why are the values rounded? I would like to replace with the exact values from the array.
CodePudding user response:
Does this randomly sample from the array?
Yes, you are right.
Why are the values rounded?
It is display problem, if convert to list get real data:
df[df == 'X'] = np.random.choice(arr, replace=True)
print (df.to_dict('list'))
{'A': [0.1, 0.1, 0.1, 0.1, 0.9999997377248664],
'B': [0.1, 0.1, 0.9999997377248664, 0.1, 0.1],
'C': [0.1, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664, 0.9999997377248664]}