I have DataFrame in Python Pandas like below:
ID | COL1 | ... | COL_n
----|------|-----|-------
123 | P | ... | ...
123 | P | ... | ...
123 | M | ... | ...
444 | BM | ... | ...
567 | BM | ... | ...
29 | P | ... | ...
And I need to aggregate above columns so as to have output like below:
How many time client bought P -> col: X_P
How many time client bought M -> col: X_M
How many time client bought in total M P -> col: X_PM
in my real dataset I have many more columns and I need to also have these columns in output
ID COL1 ... COL_n X_P X_M X_PM 123 P ... ... 2 1 3 123 P ... ... 2 1 3 123 M ... ... 2 1 3 444 BM ... ... 0 0 0 567 BM ... ... 0 0 0 29 P ... ... 1 0 1
How can I do that in Python Pandas ?
CodePudding user response:
You could do two steps with pd.crosstab
s = (pd.crosstab(df['ID'],df['COL1'])[['M','P']].
assign(MP=lambda x : x.sum(1)).
reindex(df['ID']).
add_prefix('X_'))
s.index = df.index
df = df.join(s)
Out[178]:
ID COL1 X_M X_P X_MP
0 123 P 1 2 3
1 123 P 1 2 3
2 123 M 1 2 3
3 444 BM 0 0 0
4 567 BM 0 0 0
5 29 P 0 1 1