Home > Enterprise >  Pandas Return values based on a range of data with condition
Pandas Return values based on a range of data with condition

Time:06-24

I am working using python and pandas with the dataset below

Name Subject Grade
Alex Science A
Bob Maths B
Bob Maths C
Cynthia Science C
Dylan Geography A
Dylan Geography A

I would like to find names and subjects that have been repeated twice and display them exactly like the table below. The grades will only get concatenated by a comma if required.

Name Subject Grade
Bob Maths B,C
Dylan Geography A

Any help would be greatly appreciated

CodePudding user response:

You could find your duplicated rows based on Name and Subject columns using duplicated, and then use groupby with a custom join.

set is used to ensure a single Grade is returned when Grades are the same.

key_cols = ['Name','Subject']
df[df.duplicated(key_cols, keep=False)].groupby(key_cols,as_index=False).agg({'Grade':lambda x: ','.join(set(x))})

prints:

    Name    Subject Grade
0    Bob      Maths   C,B
1  Dylan  Geography     A
  • Related