This is data that indicates whether a single person has capitalizable and non-capitalizable contracts. I want to have it grouped by names and create a new column that tells me if he/she has both contracts and the sum of all contracts. (its a pretty big one)
id | name | Capitalization | number of contracts |
---|---|---|---|
1 | Jimmy | yes | 3 |
1 | Jimmy | no | 1 |
2 | Jenny | no | 7 |
3 | Elle | yes | 5 |
4 | Danny | yes | 2 |
5 | Charles | yes | 1 |
6 | Freddy | no | 3 |
7 | Elle | yes | 5 |
7 | Elle | no | 3 |
The final result would look like this:
id | name | Capitalization_x | number of contracts |
---|---|---|---|
1 | Jimmy | both | 4 |
2 | Jenny | no | 7 |
3 | Elle | yes | 5 |
4 | Danny | yes | 2 |
5 | Charles | yes | 1 |
6 | Freddy | no | 3 |
7 | Elle | both | 8 |
I am really stuck with this problem.
CodePudding user response:
Using aggregation:
(df.groupby('id').agg({'Capitalization': lambda s: 'both' if len(set(s))==2 else s,
'number of contracts': 'sum',
'name': 'first'}))
Here, I assume the only possible values in Capitalization
are "yes"
or "no"
. Let me know if otherwise.
You can always change the logic in the lambda function to accommodate more complex cases.
name Capitalization number of contracts
id
1 Jimmy both 4
2 Jenny no 7
3 Elle yes 5
4 Danny yes 2
5 Charles yes 1
6 Freddy no 3
7 Elle both 8