Home > database >  Count and Sum of distinct values of the tuple pairs in python
Count and Sum of distinct values of the tuple pairs in python

Time:06-10

import pandas as pd
dt = {'order_id': ['A','A','B','B','B','C'], 'XY_ID': [4,5,4,5,6,4]}
print(pd.DataFrame(data=dt))
order_id XY_ID
A 4
A 5
B 4
B 5
B 6
C 4

and i want by output like below which is group all unique tuples of two from Col2 and count the number of Col1 associated with them

for 4,5 pairs A->4,B->4, A->5,B->5 count is 2, because there is 2 common relation via 4,
for 5,6 pairs A->5, B->5 B->6 count is 1, because there is 1 common relation via 5

XY_ID_Tuple_IDX1 XY_ID_Tuple_ID2 order count
4 5 2
5 6 1
4 6 1

i tried with df.groupby(['col1', 'col2']).size().reset_index(name='count') and pivot_table()

CodePudding user response:

Here is an approach using itertools.combinations:

from itertools import combinations

s = (df.groupby('order_id')['XY_ID']
       .agg(lambda x: list(combinations(x, 2)))
       .explode()
     )

out = s.groupby(s).count()

output:

XY_ID
(4, 5)    2
(4, 6)    1
(5, 6)    1
Name: XY_ID, dtype: int64

Provided format:

# code above  

idx = pd.MultiIndex.from_tuples(out.index,
                                names=['XY_ID_Tuple_IDX1', 'XY_ID_Tuple_IDX2'])
out2 = out.to_frame('order_id').set_axis(idx).reset_index()

output:

   XY_ID_Tuple_IDX1  XY_ID_Tuple_IDX2  order_id
0                 4                 5         2
1                 4                 6         1
2                 5                 6         1

CodePudding user response:

The trick is to join the dataframe with itself:

counts = (df.merge(df, on='order_id')
            .groupby(['XY_ID_x', 'XY_ID_y'])
            .count()
            .reset_index()
          )
counts.loc[counts['XY_ID_x'] < counts['XY_ID_y']]

counts content:

   XY_ID_x  XY_ID_y  order_id
1        4        5         2
2        4        6         1
5        5        6         1
  • Related