I am working using python and pandas with the dataset below
Name | Subject | Grade |
---|---|---|
Alex | Science | A |
Bob | Maths | B |
Bob | Maths | C |
Cynthia | Science | C |
Dylan | Geography | A |
Dylan | Geography | A |
I would like to find names and subjects that have been repeated twice and display them exactly like the table below. The grades will only get concatenated by a comma if required.
Name | Subject | Grade |
---|---|---|
Bob | Maths | B,C |
Dylan | Geography | A |
Any help would be greatly appreciated
CodePudding user response:
You could find your duplicated rows based on Name and Subject columns using duplicated
, and then use groupby
with a custom join
.
set
is used to ensure a single Grade is returned when Grades are the same.
key_cols = ['Name','Subject']
df[df.duplicated(key_cols, keep=False)].groupby(key_cols,as_index=False).agg({'Grade':lambda x: ','.join(set(x))})
prints:
Name Subject Grade
0 Bob Maths C,B
1 Dylan Geography A