i have a dataframe which i'm trying to create new columns showing the occurrence of different combinations within different groups. Solutions I've found are all combinations of values across 2 or more columns instead of one. Therefore, is hoping somebody can help.
sample df:
╔════╦═════╗ ║ id ║ tag ║ ╠════╬═════╣ ║ a ║ 1 ║ ║ a ║ 1 ║ ║ a ║ 2 ║ ║ a ║ 2 ║ ║ a ║ 3 ║ ║ a ║ 3 ║ ║ b ║ 2 ║ ║ b ║ 2 ║ ║ b ║ 2 ║ ║ b ║ 3 ║ ║ b ║ 3 ║ ║ b ║ 3 ║ ╚════╩═════╝
output hope to get:
╔════╦═════╦═════╦═════╦═════╦═════╦═════╦═════╗ ║ id ║ tag ║ 1,1 ║ 1,2 ║ 1,3 ║ 2,2 ║ 2,3 ║ 3,3 ║ ╠════╬═════╬═════╬═════╬═════╬═════╬═════╬═════╣ ║ a ║ 1 ║ 1 ║ 4 ║ 4 ║ 1 ║ 4 ║ 1 ║ ║ a ║ 1 ║ 1 ║ 4 ║ 4 ║ 1 ║ 4 ║ 1 ║ ║ a ║ 2 ║ 1 ║ 4 ║ 4 ║ 1 ║ 4 ║ 1 ║ ║ a ║ 2 ║ 1 ║ 4 ║ 4 ║ 1 ║ 4 ║ 1 ║ ║ a ║ 3 ║ 1 ║ 4 ║ 4 ║ 1 ║ 4 ║ 1 ║ ║ a ║ 3 ║ 1 ║ 4 ║ 4 ║ 1 ║ 4 ║ 1 ║ ║ b ║ 2 ║ 0 ║ 0 ║ 0 ║ 3 ║ 9 ║ 3 ║ ║ b ║ 2 ║ 0 ║ 0 ║ 0 ║ 3 ║ 9 ║ 3 ║ ║ b ║ 2 ║ 0 ║ 0 ║ 0 ║ 3 ║ 9 ║ 3 ║ ║ b ║ 3 ║ 0 ║ 0 ║ 0 ║ 3 ║ 9 ║ 3 ║ ║ b ║ 3 ║ 0 ║ 0 ║ 0 ║ 3 ║ 9 ║ 3 ║ ║ b ║ 3 ║ 0 ║ 0 ║ 0 ║ 3 ║ 9 ║ 3 ║ ╚════╩═════╩═════╩═════╩═════╩═════╩═════╩═════╝
sample df code:
data = {
"id": ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
"tag": [1, 1, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3]}
df = pd.DataFrame(data)
for clarification: "col "x,y" is the combinations of the tag values grouped by the id" as mentioned by @Chrysophylaxs (thanks).
kindly advise
CodePudding user response:
I got the answer here:
data = {
"id": ['a', 'a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b', 'b'],
"tag": [1, 1, 2, 2, 3, 3, 2, 2, 2, 3, 3, 3]}
df = pd.DataFrame(data)
from itertools import combinations
df['combinations'] = df.groupby(['id']).transform(lambda x: str(list(combinations(x.to_list(), 2))))
df['combinations'] = df['combinations'].apply(lambda x: x.replace('[', '').replace(']', '').replace('),', '*').replace(', ','-').replace('*',',').replace('(','').replace(')','').replace(' ','')).str.split(',')
df2 = df.drop_duplicates(['combinations'])
x = df2.explode('combinations')
x = x.drop('tag', axis=1).groupby(['id', 'combinations']).value_counts().unstack().reset_index().fillna(0)
df.merge(x, on='id', how='left').drop('combinations', axis=1)
Output:
id tag 1-1 1-2 1-3 2-2 2-3 3-3
0 a 1 1.0 4.0 4.0 1.0 4.0 1.0
1 a 1 1.0 4.0 4.0 1.0 4.0 1.0
2 a 2 1.0 4.0 4.0 1.0 4.0 1.0
3 a 2 1.0 4.0 4.0 1.0 4.0 1.0
4 a 3 1.0 4.0 4.0 1.0 4.0 1.0
5 a 3 1.0 4.0 4.0 1.0 4.0 1.0
6 b 2 0.0 0.0 0.0 3.0 9.0 3.0
7 b 2 0.0 0.0 0.0 3.0 9.0 3.0
8 b 2 0.0 0.0 0.0 3.0 9.0 3.0
9 b 3 0.0 0.0 0.0 3.0 9.0 3.0
10 b 3 0.0 0.0 0.0 3.0 9.0 3.0
11 b 3 0.0 0.0 0.0 3.0 9.0 3.0