I have a Pandas dataframe something like:
Feature A | Feature B | Feature C |
---|---|---|
A1 | B1 | C1 |
A2 | B2 | C2 |
Given k as input, i want all values combination grouped by feature of length k, for example for k = 2 I want:
[{A:A1, B:B1},
{A:A1, B:B2},
{A:A1, C:C1},
{A:A1, C:C2},
{A:A2, B:B1},
{A:A2, B:B2},
{A:A2, C:C1},
{A:A2, C:C2},
{B:B1, C:C1},
{B:B1, C:C2},
{B:B2, C:C1},
{B:B2, C:C2}]
How can I achieve that?
CodePudding user response:
This is probably not that efficient but it works for small scale.
First, determine the unique combinations of k
columns.
from itertools import combinations
k = 2
cols = list(combinations(df.columns, k))
Then use MultiIndex.from_product
to get cartesian product of k
columns.
result = []
for c in cols:
result = pd.MultiIndex.from_product([df[x] for x in c]).values.tolist()