I have a pandas dataframe that I want to randomly pick samples from it. The first time I want to pick 10, then 20, 30, 40, and 50 random samples (without replacment). I'm trying to do it with a for loop, altough I don't know how good this is cause a list can't contain data frames, right? (my coding is better with R and there the lists can contain dataframes).
number = [10,20,30,40,50]
sample = []
for i in range(len(number)):
sample[i].append(data.sample(n = number[i]))
And the error is IndexError: list index out of range
I dont want to copy past the code so what is the right way to do it?
CodePudding user response:
Try range(len(number)-1). The reason is for loop starts from 0 to n. So in this case it will start from 0 then till 5. Which makes a total of 6 loops (0,1,2,3,4,5). That's why your list goes out of range
CodePudding user response:
You could do that using radint
method for choosing random element from the list number
:
import random
number = [10,20,30,40,50]
sample = []
for i in range(len(number)):
sample.append(data.sample(n = number[random.randint(0, len(number)-1]))
Update:
Assuming you have this dataframe for Movies rating dataset:
data = [['avengers', 5.4 ,'PG-13'],
['captain america', 6.7, 'PG-13'],
['spiderman', 7, 'R'],
['daredevil', 8.2, 'R'],
['iron man', 8.6, 'PG-13'],
['deadpool', 10, 'R']]
df = pd.DataFrame(data, columns=['title', 'score', 'rating'])
You can take random samples from it using sample
method:
# taking random 3 records from dataframe
samples = df.sample(3)
Output:
title score rating
1 captain america 6.7 PG-13
5 deadpool 10.0 R
3 daredevil 8.2 R
Another execution:
title score rating
4 iron man 8.6 PG-13
0 avengers 5.4 PG-13
2 spiderman 7.0 R