Home > Back-end >  Random subsetting a data frame from a larger dataframe
Random subsetting a data frame from a larger dataframe

Time:05-21

n = 100 # (n=height * width)
height = 10
width = 10
column = [1,2,3,4,5,6,7,8,9,10]
indices = [1,2,3,4,5,6,7,8,9,10]

Rack2 = pd.DataFrame(np.random.choice(np.arange(n),size=(height, width), replace=False), index=list(indices), columns=list(column))
Rack = Rack2.sort_index(ascending=False)
a = np.repeat([True,False], Rack.size//2) 
b = np.random.shuffle(a)
a = a.reshape(Rack.shape)

SI = Rack.mask(a)
RI = Rack.where(a)

StorageSet = SI.stack() 
ss=dfStorage.index

RetrievalSet = RI.stack() 
tt=D3.index

In the python code above, there is a 10x10 Rack. Half of the rack (50 items) consists of storage items and the other half consists of retrieval items.

I want to do it not half of the rack size but if I have a 10x10 rack for example 30 of that data frame are storage items. 30 of the remaining 70 items are the retrieval items. How can I do this?

CodePudding user response:

You could do this with a couple revisions to the code. First change the initialization of a:

samp_size = 30
a = np.hstack([np.repeat(0, samp_size), np.repeat(1, samp_size), np.repeat(np.nan, n - (2 * samp_size)])

Then you can get SI and RI as:

SI = Rack.where(a==0)
RI = Rack.where(a==1)

The rest of your code should work the same.

  • Related