Home > OS >  Randomly selecting subsamples from dataframe using python
Randomly selecting subsamples from dataframe using python

Time:09-27

I have dataframe df. I would like to select the 6 sab samples randomly that contain fixed data (=100) size and does not have repetitive values. So far, I have written the following codes:

df_ = df.sample(n=6000)
n = 6  # specifying number of sample need
size = int(df_.shape[0]/n)
chunks = list()
for i in range(0, df.shape[0], size):
    chunks.append(df.iloc[i:i size])

But when I select a sample, say subsample_1=chunks[1] then the results are not random but are in order. Any advice, how to select 6 random subsamples from given df that are not repetitive data?

CodePudding user response:

Edit: After reading your comment and as @mozway mentioned:

You can use sample to shuffle the entire DataFrame and specify the multiple of the size you want. Then use np.array_split to split it into the desired number of subsets likewise:

num = 6
df_shuffled = df.sample(n = num*100, frac=1)  #This will shuffle the entire Dataframe
 

chunks = np.array_split(df_shuffled, num)

chunks will give you a list of the num required DataFrames.

  • Related