Python pandas conditional random select - selecting random item with multiple criteria-CodePudding

Trying to make a random selection from a list of products:

product_id  type    year_created
1   shirt   2021
2   book    2021
3   chair   2022
4   shirt   2021
5   book    2022
6   shirt   2022
7   shirt   2022
8   desk    2021
9   shirt   2022
10  lamp    2022
11  tv  2021
12  tv  2022

...

Would like to select random product ids, but making sure that the end result has one of each "type" in each "year released". So only one shirt from 2021, one shirt from 2022, one book from 2021, one book from 2022, etc.

Is there a way to do this in Python with a function apart from filtering the data and running a random query on each subset? Thank you!

CodePudding user response：

You might want to look into pandas groupby(). I think this example might be what you're looking for.

CodePudding user response：

This is what I have understood:

import pandas as pd
df = pd.DataFrame(dict(
    product_id=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    type=['shirt', 'book', 'chair', 'shirt', 'book', 'shirt', 'shirt', 'desk', 'shirt', 'lamp', 'tv', 'tv'],
    year_created=[2021, 2021, 2022, 2021, 2022, 2022, 2022, 2021, 2022, 2022, 2021, 2022]
))
df

group_size = 1
df.groupby(by=['type', 'year_created'], group_keys=False).apply(lambda x: x.sample(group_size))