How to find that each student number belongs to unique student?-CodePudding

I have a data frame and around 10-12 columns. One of the column is the student number e.g. 1234567 and the other is an identifier e.g passport numbers, license number . How can I find that each student has a unique identifier. Like student 1234567 has identifier ABC5679K only. Also I want to store the students who are tagged with duplicate identifier. For e.g. If student 1234567 also has identifier ABC3408T, I want to know those.

CodePudding user response：

Assuming such input, where the first student has a unique other id, while the second has two different ids:

   student_id other_id
0           1        a
1           1        a
2           2        b
3           2        c

you can use GroupBy.transform with nunique for that:

mask = df.groupby('student_id')['other_id'].transform('nunique').gt(1)

print(df[mask])

output:

   student_id other_id
2           2        b
3           2        c

Or, for a list of all students with duplicated other ids:

s = df.groupby('student_id')['other_id'].nunique()
s[s>1].index.to_list()

output: [2]

CodePudding user response：

df.groupby(["student_name"])["passport_number"].nunique() > 1

You can use the groupby and nunique function to help you identify repeats. Hope this answer your question.