Home > Net >  Random sampling from a dataframe
Random sampling from a dataframe

Time:03-30

I want to generate 2x6 dataframe which represents a Rack.Half of this dataframe are filled with storage items and the other half is with retrieval items. I want to do is random chosing half of these 12 items and say that they are storage and others are retrieval. How can I randomly choose?

I tried random.sample but this chooses random columns.Actually I want to choose random items individually.

CodePudding user response:

Assuming this input:

   0  1  2  3   4   5
0  0  1  2  3   4   5
1  6  7  8  9  10  11

You can craft a random numpy array to select/mask half of the values:

a = np.repeat([True,False], df.size//2)
np.random.shuffle(a)
a = a.reshape(df.shape)

Then select your two groups:

df.mask(a)
     0   1    2    3   4     5
0  NaN NaN  NaN  3.0   4   NaN
1  6.0 NaN  8.0  NaN  10  11.0

df.where(a)
     0  1    2    3   4    5
0  0.0  1  2.0  NaN NaN  5.0
1  NaN  7  NaN  9.0 NaN  NaN

If you simply want 6 random elements, use nummy.random.choice:

np.random.choice(df.to_numpy(). ravel(), 6, replace=False)

Example:

array([ 4,  5, 11,  7,  8,  3])
  • Related