My data is made of about 9000 observations and 20 features (Edit - Pandas dataframe). I've taken a sample of 200 observations like this and conducted some analysis on it:
sample_data = data.sample(n = 200)
Now I want to randomely take a sample of 1000 observations from the original data, with non of the observations that showed up in the previous n = 200 sample. How do I do that?
CodePudding user response:
If you are using pandas.DataFrame
, you can simply do it by dropping the old ones and sampling 1000 new ones from the remaining data:
prev_sample_index = sample_data.index
filtered_data = data.drop(prev_sample_index)
new_sample = filtered_data.sample(n = 1000)