Home > other >  Taking two samples from the data but with different observations
Taking two samples from the data but with different observations

Time:12-24

My data is made of about 9000 observations and 20 features (Edit - Pandas dataframe). I've taken a sample of 200 observations like this and conducted some analysis on it:

sample_data = data.sample(n = 200)

Now I want to randomely take a sample of 1000 observations from the original data, with non of the observations that showed up in the previous n = 200 sample. How do I do that?

CodePudding user response:

If you are using pandas.DataFrame, you can simply do it by dropping the old ones and sampling 1000 new ones from the remaining data:

prev_sample_index = sample_data.index
filtered_data = data.drop(prev_sample_index)
new_sample = filtered_data.sample(n = 1000)
  • Related