Home > Back-end >  How to modify a dataframe with conditions for each name duplicated
How to modify a dataframe with conditions for each name duplicated

Time:06-15

This is data that indicates whether a single person has capitalizable and non-capitalizable contracts. I want to have it grouped by names and create a new column that tells me if he/she has both contracts and the sum of all contracts. (its a pretty big one)

id name Capitalization number of contracts
1 Jimmy yes 3
1 Jimmy no 1
2 Jenny no 7
3 Elle yes 5
4 Danny yes 2
5 Charles yes 1
6 Freddy no 3
7 Elle yes 5
7 Elle no 3

The final result would look like this:

id name Capitalization_x number of contracts
1 Jimmy both 4
2 Jenny no 7
3 Elle yes 5
4 Danny yes 2
5 Charles yes 1
6 Freddy no 3
7 Elle both 8

I am really stuck with this problem.

CodePudding user response:

Using aggregation:

(df.groupby('id').agg({'Capitalization': lambda s: 'both' if len(set(s))==2 else s, 
                       'number of contracts': 'sum',
                       'name': 'first'}))

Here, I assume the only possible values in Capitalization are "yes" or "no". Let me know if otherwise.

You can always change the logic in the lambda function to accommodate more complex cases.


       name Capitalization  number of contracts
id                                             
1     Jimmy           both                    4
2     Jenny             no                    7
3      Elle            yes                    5
4     Danny            yes                    2
5   Charles            yes                    1
6    Freddy             no                    3
7      Elle           both                    8
  • Related