Home > Software design >  Pandas transform columns into counts grouped by ID
Pandas transform columns into counts grouped by ID

Time:09-29

I have a dataframe like this:

|   |ID |sex|est|
| 0 |aaa| M | S |
| 1 |aaa| M | C |
| 2 |aaa| F | D |
| 3 |bbb| F | D |
| 4 |bbb| M | C |
| 5 |ccc| F | C |

I need to change it to this:

|   |ID | M | F | S | C | D |
| 0 |aaa| 2 | 1 | 1 | 1 | 1 |
| 1 |bbb| 1 | 1 | 0 | 1 | 1 |
| 2 |ccc| 0 | 1 | 0 | 1 | 0 |

I need to count from each unique ID the number of entries for each row but I can't do it manually, there are too many rows and columns.

CodePudding user response:

Try this:

out = (df
       .set_index('ID')
       .stack()
       .str.get_dummies()
       .groupby(level=0)
       .sum()
       .reset_index()
      )
print(out)
    ID  C  D  F  M  S
0  aaa  1  1  1  2  1
1  bbb  1  1  1  1  0
2  ccc  1  0  1  0  0

CodePudding user response:

Use pd.get_dummies directly, to avoid the stack step, before computing on the groupby:

(pd
.get_dummies(
    df, 
    columns=['sex', 'est'], 
    prefix_sep='', 
    prefix='')
.groupby('ID', as_index=False)
.sum()
)
    ID  F  M  C  D  S
0  aaa  1  2  1  1  1
1  bbb  1  1  1  1  0
2  ccc  1  0  1  0  0
  • Related