df.groupby(['target']).count()
Target | data |
---|---|
Negative | 103210 |
Positive | 211082 |
right now, my positive data is too large, I want to delete 50% of rows that value in the Target column is Positive. How can I do it?
Many thanks !!
CodePudding user response:
You can sample
50% of the Positive
rows and drop
those indexes:
indexes = df[df['target'].eq('Positive')].sample(frac=0.5).index
df = df.drop(indexes)
CodePudding user response:
My take would be to create an auxiliary column to use as a 50% parameter. So I'd create a subset with only the positive values and then use uniform distributions to set the 50%
new_def = df[df['cols'=='Positive']]
new_def['split'] = np.random.randint(2, size=new_def.shape[1])
new_def = new_def[new_def['split']==1] #half of set