I would like to get all possible combinations of rows in a pandas.DataFrame
but without replacement and without order.
I could manage to do the first part (without replacement):
df = pd.DataFrame({'data': ['a', 'b'], 'actual_key': [1, 2], 'dummy_key': [0, 0]})
df_combs = pd.merge(df, df, on='dummy_key')
df_combs = df_combs[df_combs['actual_key_x'] != df_combs['actual_key_y']]
>> df_combs
>> a 1 0 b 2
>> b 2 0 a 1
But I am not able to remove the redundant rows (without order).
CodePudding user response:
It looks like you can avoid having a dummy key and cross join the DF to itself, filtering out the identical x and y values, and then create a new key to identify duplicates by putting both keys in a frozenset (which is hashable) and then drop duplicates on that key, eg:
(
df.merge(df, how='cross')
.query('actual_key_x != actual_key_y')
.assign(dupekey=lambda v: v[['actual_key_x', 'actual_key_y']].apply(frozenset, axis=1))
.drop_duplicates(subset=['dupekey'])
.drop(columns=['dupekey'])
)