Home > Software design >  split python dataframe with certain number of observations
split python dataframe with certain number of observations

Time:12-13

Here's the simple version of data frame that I have:

customer_ID value_1 value_2 ....
1            0.5    0.2
1            ...    ...
1
1
2
2
3
3
3
....

Suppose I have 1000 unique customers in the above data frame and only want to get a sample of data frame with 100 customers in it. The customer_ID is random, and I don't know who's the 100th customer, which means I cannot just assign customers with customer_ID <= 100 into one data frame. How should I do it?

Thanks!

CodePudding user response:

  1. you can take all the customers_ids to a list:

unique_ID=df.customer_ID.unique()

  1. then choose randomly 100 of them to another list

import random

random_ID = random.sample(unique_ID, 100)

  1. and finally filter your dataframe with that list

df[df['customer_ID'].isin(random_ID)]

hope it helps

  • Related