I have a dataframe like this:
| |ID |sex|est|
| 0 |aaa| M | S |
| 1 |aaa| M | C |
| 2 |aaa| F | D |
| 3 |bbb| F | D |
| 4 |bbb| M | C |
| 5 |ccc| F | C |
I need to change it to this:
| |ID | M | F | S | C | D |
| 0 |aaa| 2 | 1 | 1 | 1 | 1 |
| 1 |bbb| 1 | 1 | 0 | 1 | 1 |
| 2 |ccc| 0 | 1 | 0 | 1 | 0 |
I need to count from each unique ID the number of entries for each row but I can't do it manually, there are too many rows and columns.
CodePudding user response:
Try this:
out = (df
.set_index('ID')
.stack()
.str.get_dummies()
.groupby(level=0)
.sum()
.reset_index()
)
print(out)
ID C D F M S
0 aaa 1 1 1 2 1
1 bbb 1 1 1 1 0
2 ccc 1 0 1 0 0
CodePudding user response:
Use pd.get_dummies
directly, to avoid the stack step, before computing on the groupby:
(pd
.get_dummies(
df,
columns=['sex', 'est'],
prefix_sep='',
prefix='')
.groupby('ID', as_index=False)
.sum()
)
ID F M C D S
0 aaa 1 2 1 1 1
1 bbb 1 1 1 1 0
2 ccc 1 0 1 0 0