I have a table of unique products and reviews:
ProductID Comment
1 Great product!
2 Terrible
2 Amazing!
The table (a csv) is about ~170,000 rows. I'm looking to retain 5% of comments for each unique ProductID. Is there a functionality in Pandas that will let me do this?
CodePudding user response:
you could use groupby with sample.
df.groupby('ProductID').sample(frac=.05)