I have a pandas dataframe df
which appears as following: (toy version below but the real df contains many more columns and groups
)
group sub fruit
a 1 apple
a 2 banana
a 3 orange
b 1 pear
b 2 strawberry
b 3 cherry
c 1 kiwi
c 2 tomato
c 3 lemon
All group
s have the same number of rows. I am trying to generate a new dataframe that contains all the combinations of group
and sub
by randomly selecting 1 row from each group
.
Desired output:
combo group sub fruit
1 a 1 apple
1 b 1 pear
1 c 1 kiwi
2 a 2 banana
2 b 2 strawberry
2 c 1 kiwi
3 a 3 orange
3 b 2 strawberry
3 c 1 kiwi
4 a 2 banana
4 b 2 strawberry
4 c 3 lemon
5 a 3 orange
5 b 3 cherry
5 c 3 lemon
...
In this particular example, I would expect 27 different combo
s. This example seems helpful but I haven't been able to iteratively generate each combination: Randomly select a row from each group using pandas
CodePudding user response:
You can use itertools.product
on the groups of indices:
from itertools import product
out = pd.concat({i: df.loc[list(idx)] for i, idx in
enumerate(product(*df.index.groupby(df['group']).values()), start=1)})
output:
group sub fruit
1 0 a 1 apple
3 b 1 pear
6 c 1 kiwi
2 0 a 1 apple
3 b 1 pear
... ... ... ...
26 5 b 3 cherry
7 c 2 tomato
27 2 a 3 orange
5 b 3 cherry
8 c 3 lemon
[81 rows x 3 columns]