Home > Mobile >  Pandas: retain n% of data for unique values?
Pandas: retain n% of data for unique values?

Time:07-10

I have a table of unique products and reviews:

ProductID  Comment
  1        Great product!
  2        Terrible
  2        Amazing!

The table (a csv) is about ~170,000 rows. I'm looking to retain 5% of comments for each unique ProductID. Is there a functionality in Pandas that will let me do this?

CodePudding user response:

you could use groupby with sample.

df.groupby('ProductID').sample(frac=.05)
  • Related