Home > Blockchain >  How to aggregate DataFrame to count valus per ID in also with other columns in output Python Pandas?
How to aggregate DataFrame to count valus per ID in also with other columns in output Python Pandas?

Time:10-11

I have DataFrame in Python Pandas like below:

ID  | COL1 | ... | COL_n
----|------|-----|-------
123 | P    | ... | ...
123 | P    | ... | ...
123 | M    | ... | ...
444 | BM   | ... | ...
567 | BM   | ... | ...
29  | P    | ... | ...

And I need to aggregate above columns so as to have output like below:

  • How many time client bought P -> col: X_P

  • How many time client bought M -> col: X_M

  • How many time client bought in total M P -> col: X_PM

  • in my real dataset I have many more columns and I need to also have these columns in output

    ID COL1 ... COL_n X_P X_M X_PM
    123 P ... ... 2 1 3
    123 P ... ... 2 1 3
    123 M ... ... 2 1 3
    444 BM ... ... 0 0 0
    567 BM ... ... 0 0 0
    29 P ... ... 1 0 1

How can I do that in Python Pandas ?

CodePudding user response:

You could do two steps with pd.crosstab

s = (pd.crosstab(df['ID'],df['COL1'])[['M','P']].
        assign(MP=lambda x : x.sum(1)).
        reindex(df['ID']).
        add_prefix('X_'))
s.index = df.index
df = df.join(s)
Out[178]: 
    ID COL1  X_M  X_P  X_MP
0  123    P    1    2     3
1  123    P    1    2     3
2  123    M    1    2     3
3  444   BM    0    0     0
4  567   BM    0    0     0
5   29    P    0    1     1
  • Related