I have a dataframe in like below
id | group | log |
---|---|---|
10 | UU1Q | 23 |
10 | UU1Q | 12 |
10 | UU2Q | 15 |
11 | UU2Q | 17 |
11 | UU3Q | 35.6 |
11 | UU1Q | 29.8 |
11 | UU1Q | 33 |
11 | UU1Q | 44 |
13 | UU2Q | 17.77 |
13 | UU2Q | 19.90 |
13 | UU2Q | 55 |
14 | UU3Q | 33 |
15 | UU3Q | 22 |
For Each ID and group I want to create all possible combination of log values present in the dataframe in a new column. Desired Output
id | group | log | new_col |
---|---|---|---|
10 | UU1Q | 23 | (23,23) |
10 | UU1Q | 12 | (23,12) |
10 | UU2Q | 15 | (15,15) |
11 | UU2Q | 17 | (17,17) |
11 | UU3Q | 35.6 | (35.6,35.6) |
11 | UU1Q | 29.8 | (29.8, 29.8) |
11 | UU1Q | 33 | (29.8,33) |
11 | UU1Q | 44 | (29.8,44) |
11 | UU1Q | (33,44) | |
13 | UU2Q | 17.77 | (17.77,17.77) |
13 | UU2Q | 19.90 | (17.77,19.90) |
13 | UU2Q | 55 | (17.77,55) |
13 | UU2Q | (19.90,55) | |
14 | UU3Q | 33 | (33,33) |
15 | UU3Q | 22 | (22,22) |
I used shift function but it is only generating combinations with next matching cell. I want to get all the possible combination in each group.
dummy['new'] = dummy.groupby(['ID', 'group'])['log'].shift()
CodePudding user response:
This is close what need - added all combinations and if one element per group is created tuple with same values:
from itertools import combinations
df = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)) if len(x) > 1 else [(*x, *x)])
.explode()
.reset_index(name='comb'))
print (df)
id group comb
0 10 UU1Q (23.0, 12.0)
1 10 UU2Q (15.0, 15.0)
2 11 UU1Q (29.8, 33.0)
3 11 UU1Q (29.8, 44.0)
4 11 UU1Q (33.0, 44.0)
5 11 UU2Q (17.0, 17.0)
6 11 UU3Q (35.6, 35.6)
7 13 UU2Q (17.77, 19.9)
8 13 UU2Q (17.77, 55.0)
9 13 UU2Q (19.9, 55.0)
10 14 UU3Q (33.0, 33.0)
11 15 UU3Q (22.0, 22.0)
Or is possible create same values tuples of first rows per ['id','group'] and join to DataFrame df1
filled by combinations:
from itertools import combinations
df1 = (df.groupby(['id','group'])['log']
.apply(lambda x: list(combinations(x, 2)))
.explode()
.dropna()
.reset_index(name='comb'))
df2 = df.groupby(['id','group']).head(1).copy()
df2['comb'] = df2.pop('log').map(lambda x: (x,x))
df = pd.concat([df2, df1]).sort_values(['id','group'], ignore_index=True)
print (df)
id group comb
0 10 UU1Q (23.0, 23.0)
1 10 UU1Q (23.0, 12.0)
2 10 UU2Q (15.0, 15.0)
3 11 UU1Q (29.8, 29.8)
4 11 UU1Q (29.8, 33.0)
5 11 UU1Q (29.8, 44.0)
6 11 UU1Q (33.0, 44.0)
7 11 UU2Q (17.0, 17.0)
8 11 UU3Q (35.6, 35.6)
9 13 UU2Q (17.77, 17.77)
10 13 UU2Q (17.77, 19.9)
11 13 UU2Q (17.77, 55.0)
12 13 UU2Q (19.9, 55.0)
13 14 UU3Q (33.0, 33.0)
14 15 UU3Q (22.0, 22.0)
CodePudding user response:
This will return the desired output:
df.groupby(['id','group'], as_index=False).agg({'log':lambda x: list(x)})
Output:
id group log
0 10 UU1Q [23.0, 12.0]
1 10 UU2Q [15.0]
2 11 UU1Q [29.8, 33.0, 44.0]
3 11 UU2Q [17.0]
4 11 UU3Q [35.6]
5 13 UU2Q [17.77, 19.9, 55.0]
6 14 UU3Q [33.0]
7 15 UU3Q [22.0]