Trying to make a random selection from a list of products:
product_id type year_created
1 shirt 2021
2 book 2021
3 chair 2022
4 shirt 2021
5 book 2022
6 shirt 2022
7 shirt 2022
8 desk 2021
9 shirt 2022
10 lamp 2022
11 tv 2021
12 tv 2022
...
Would like to select random product ids, but making sure that the end result has one of each "type" in each "year released". So only one shirt from 2021, one shirt from 2022, one book from 2021, one book from 2022, etc.
Is there a way to do this in Python with a function apart from filtering the data and running a random query on each subset? Thank you!
CodePudding user response:
You might want to look into pandas groupby()
. I think this example might be what you're looking for.
CodePudding user response:
This is what I have understood:
import pandas as pd
df = pd.DataFrame(dict(
product_id=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
type=['shirt', 'book', 'chair', 'shirt', 'book', 'shirt', 'shirt', 'desk', 'shirt', 'lamp', 'tv', 'tv'],
year_created=[2021, 2021, 2022, 2021, 2022, 2022, 2022, 2021, 2022, 2022, 2021, 2022]
))
df
group_size = 1
df.groupby(by=['type', 'year_created'], group_keys=False).apply(lambda x: x.sample(group_size))