Home > front end >  Pandas: Combination of all rows without order and without replacement
Pandas: Combination of all rows without order and without replacement

Time:04-21

I would like to get all possible combinations of rows in a pandas.DataFrame but without replacement and without order.

I could manage to do the first part (without replacement):

df = pd.DataFrame({'data': ['a', 'b'], 'actual_key': [1, 2], 'dummy_key': [0, 0]})
df_combs = pd.merge(df, df, on='dummy_key')
df_combs = df_combs[df_combs['actual_key_x'] != df_combs['actual_key_y']]

>> df_combs
>> a    1   0   b   2
>> b    2   0   a   1

But I am not able to remove the redundant rows (without order).

CodePudding user response:

It looks like you can avoid having a dummy key and cross join the DF to itself, filtering out the identical x and y values, and then create a new key to identify duplicates by putting both keys in a frozenset (which is hashable) and then drop duplicates on that key, eg:

(
    df.merge(df, how='cross')
    .query('actual_key_x != actual_key_y')
    .assign(dupekey=lambda v: v[['actual_key_x', 'actual_key_y']].apply(frozenset, axis=1))
    .drop_duplicates(subset=['dupekey'])
    .drop(columns=['dupekey'])
)
  • Related