Here's the simple version of data frame that I have:
customer_ID value_1 value_2 ....
1 0.5 0.2
1 ... ...
1
1
2
2
3
3
3
....
Suppose I have 1000 unique customers in the above data frame and only want to get a sample of data frame with 100 customers in it. The customer_ID is random, and I don't know who's the 100th customer, which means I cannot just assign customers with customer_ID <= 100 into one data frame. How should I do it?
Thanks!
CodePudding user response:
- you can take all the customers_ids to a list:
unique_ID=df.customer_ID.unique()
- then choose randomly 100 of them to another list
import random
random_ID = random.sample(unique_ID, 100)
- and finally filter your dataframe with that list
df[df['customer_ID'].isin(random_ID)]
hope it helps