how to delete 50% rows that shares certain column value?-CodePudding

df.groupby(['target']).count()

Target	data
Negative	103210
Positive	211082

right now, my positive data is too large, I want to delete 50% of rows that value in the Target column is Positive. How can I do it?

Many thanks !!

CodePudding user response：

You can sample 50% of the Positive rows and drop those indexes:

indexes = df[df['target'].eq('Positive')].sample(frac=0.5).index
df = df.drop(indexes)

CodePudding user response：

My take would be to create an auxiliary column to use as a 50% parameter. So I'd create a subset with only the positive values and then use uniform distributions to set the 50%

new_def = df[df['cols'=='Positive']]
new_def['split']  = np.random.randint(2, size=new_def.shape[1])
new_def  = new_def[new_def['split']==1] #half of set