I have dataframe df
. I would like to select the 6 sab samples randomly that contain fixed data (=100) size and does not have repetitive values. So far, I have written the following codes:
df_ = df.sample(n=6000)
n = 6 # specifying number of sample need
size = int(df_.shape[0]/n)
chunks = list()
for i in range(0, df.shape[0], size):
chunks.append(df.iloc[i:i size])
But when I select a sample, say subsample_1=chunks[1]
then the results are not random but are in order. Any advice, how to select 6 random subsamples from given df
that are not repetitive data?
CodePudding user response:
Edit: After reading your comment and as @mozway mentioned:
You can use sample
to shuffle the entire DataFrame and specify the multiple of the size you want. Then use np.array_split to split it into the desired number of subsets likewise:
num = 6
df_shuffled = df.sample(n = num*100, frac=1) #This will shuffle the entire Dataframe
chunks = np.array_split(df_shuffled, num)
chunks
will give you a list of the num
required DataFrames.