Home > Software design >  Take random samples from the data with different number each time
Take random samples from the data with different number each time

Time:11-28

I have a pandas dataframe that I want to randomly pick samples from it. The first time I want to pick 10, then 20, 30, 40, and 50 random samples (without replacment). I'm trying to do it with a for loop, altough I don't know how good this is cause a list can't contain data frames, right? (my coding is better with R and there the lists can contain dataframes).

number = [10,20,30,40,50]
sample = []
for i in range(len(number)):
    sample[i].append(data.sample(n = number[i]))

And the error is IndexError: list index out of range

I dont want to copy past the code so what is the right way to do it?

CodePudding user response:

Try range(len(number)-1). The reason is for loop starts from 0 to n. So in this case it will start from 0 then till 5. Which makes a total of 6 loops (0,1,2,3,4,5). That's why your list goes out of range

CodePudding user response:

You could do that using radint method for choosing random element from the list number:

import random    
number = [10,20,30,40,50]
sample = []
for i in range(len(number)):
    sample.append(data.sample(n = number[random.randint(0, len(number)-1]))

Update:

Assuming you have this dataframe for Movies rating dataset:

data = [['avengers', 5.4 ,'PG-13'],
['captain america', 6.7, 'PG-13'],
['spiderman', 7,    'R'],
['daredevil', 8.2, 'R'],
['iron man', 8.6, 'PG-13'],
['deadpool', 10, 'R']]

df = pd.DataFrame(data, columns=['title', 'score', 'rating'])

You can take random samples from it using sample method:

# taking random 3 records from dataframe
samples = df.sample(3)

Output:

             title  score rating
1  captain america    6.7  PG-13
5         deadpool   10.0      R
3        daredevil    8.2      R

Another execution:

       title  score rating
4   iron man    8.6  PG-13
0   avengers    5.4  PG-13
2  spiderman    7.0      R
  • Related