How to aggregate DataFrame to count valus per ID in also with other columns in output Python Pandas?-CodePudding

I have DataFrame in Python Pandas like below:

ID  | COL1 | ... | COL_n
----|------|-----|-------
123 | P    | ... | ...
123 | P    | ... | ...
123 | M    | ... | ...
444 | BM   | ... | ...
567 | BM   | ... | ...
29  | P    | ... | ...

And I need to aggregate above columns so as to have output like below:

How many time client bought P -> col: X_P
How many time client bought M -> col: X_M
How many time client bought in total M P -> col: X_PM
in my real dataset I have many more columns and I need to also have these columns in output

ID COL1 ... COL_n X_P X_M X_PM

123 P ... ... 2 1 3

123 P ... ... 2 1 3

123 M ... ... 2 1 3

444 BM ... ... 0 0 0

567 BM ... ... 0 0 0

29 P ... ... 1 0 1

ID	COL1	...	COL_n	X_P	X_M	X_PM
123	P	...	...	2	1	3
123	P	...	...	2	1	3
123	M	...	...	2	1	3
444	BM	...	...	0	0	0
567	BM	...	...	0	0	0
29	P	...	...	1	0	1

How can I do that in Python Pandas ?

CodePudding user response：

You could do two steps with pd.crosstab

s = (pd.crosstab(df['ID'],df['COL1'])[['M','P']].
        assign(MP=lambda x : x.sum(1)).
        reindex(df['ID']).
        add_prefix('X_'))
s.index = df.index
df = df.join(s)
Out[178]: 
    ID COL1  X_M  X_P  X_MP
0  123    P    1    2     3
1  123    P    1    2     3
2  123    M    1    2     3
3  444   BM    0    0     0
4  567   BM    0    0     0
5   29    P    0    1     1

ID	COL1	...	COL_n	X_P	X_M	X_PM
123	P	...	...	2	1	3
123	P	...	...	2	1	3
123	M	...	...	2	1	3
444	BM	...	...	0	0	0
567	BM	...	...	0	0	0
29	P	...	...	1	0	1

ID	COL1	...	COL_n	X_P	X_M	X_PM
123	P	...	...	2	1	3
123	P	...	...	2	1	3
123	M	...	...	2	1	3
444	BM	...	...	0	0	0
567	BM	...	...	0	0	0
29	P	...	...	1	0	1

ID	COL1	...	COL_n	X_P	X_M	X_PM
123	P	...	...	2	1	3
123	P	...	...	2	1	3
123	M	...	...	2	1	3
444	BM	...	...	0	0	0
567	BM	...	...	0	0	0
29	P	...	...	1	0	1