Let's say I have the following data: one person can have multiple Constituency Code
Unique_ID | Name | Constituency Code |
---|---|---|
404 | Mark | Teacher |
404 | Mark | Staff |
404 | Mark | Staff |
659 | Julio | Student |
1025 | Jasmine | Staff |
1025 | Jasmine | Student |
Question: Using python and pandas, how would I combine them to this: List of unique id and name with all the constituency code combined. Notice that if there are duplicates in constituency code, but it combines it to 1 list of unique constituency code per Unique_ID
Unique_ID | Name | Constituency Code |
---|---|---|
404 | Mark | Teacher, Staff |
659 | Julio | Student |
1025 | Jasmine | Staff, Student |
I have tried to use groupby and aggregate attributes on pandas, but I am not able to get the result I want. Any suggestions. I know I can do it by iterating through each row, but I rather not do that.
CodePudding user response:
Setting up data - very important to provide working examples
test_data = [
[404, 'Mark', 'Teacher'],
[404, 'Mark', 'Staff'],
[404, 'Mark', 'Staff'],
[659, 'Julio', 'Student'],
[1025, 'Jasmine', 'Staff'],
[1025, 'Jasmine', 'Student']
]
cols = ['Unique_ID', 'Name', 'Constinuency Code']
df = pd.DataFrame(test_data, columns=cols)
df.groupby(['Unique_ID', 'Name'])['Constinuency Code'].apply(lambda grp: list(set(grp))).reset_index()
Out
Unique_ID Name Constinuency Code
0 404 Mark [Teacher, Staff]
1 659 Julio [Student]
2 1025 Jasmine [Student, Staff]
If you need a string instead:
df.groupby(['Unique_ID', 'Name'])['Constinuency Code'].apply(lambda grp: ', '.join(set(grp))).reset_index()
Out
Unique_ID Name Constinuency Code
0 404 Mark Teacher, Staff
1 659 Julio Student
2 1025 Jasmine Student, Staff