Home > Enterprise >  Generate combinations by randomly selecting a row from multiple groups (using pandas)
Generate combinations by randomly selecting a row from multiple groups (using pandas)

Time:11-11

I have a pandas dataframe df which appears as following: (toy version below but the real df contains many more columns and groups)

group  sub  fruit
a      1    apple
a      2    banana
a      3    orange
b      1    pear
b      2    strawberry
b      3    cherry
c      1    kiwi
c      2    tomato
c      3    lemon

All groups have the same number of rows. I am trying to generate a new dataframe that contains all the combinations of group and sub by randomly selecting 1 row from each group.

Desired output:

combo  group  sub  fruit
1      a      1    apple
1      b      1    pear
1      c      1    kiwi
2      a      2    banana
2      b      2    strawberry
2      c      1    kiwi
3      a      3    orange
3      b      2    strawberry
3      c      1    kiwi
4      a      2    banana
4      b      2    strawberry
4      c      3    lemon
5      a      3    orange
5      b      3    cherry
5      c      3    lemon
...

In this particular example, I would expect 27 different combos. This example seems helpful but I haven't been able to iteratively generate each combination: Randomly select a row from each group using pandas

CodePudding user response:

You can use itertools.product on the groups of indices:

from itertools import product

out = pd.concat({i: df.loc[list(idx)] for i, idx in
                 enumerate(product(*df.index.groupby(df['group']).values()), start=1)})

output:

     group  sub   fruit
1  0     a    1   apple
   3     b    1    pear
   6     c    1    kiwi
2  0     a    1   apple
   3     b    1    pear
...    ...  ...     ...
26 5     b    3  cherry
   7     c    2  tomato
27 2     a    3  orange
   5     b    3  cherry
   8     c    3   lemon

[81 rows x 3 columns]
  • Related