my goal is to compare every row with all other rows to see how many rows are unique regarding their entries. I am quite new to pandas so I am at a loss. An exemplary dataframe would be as follows:
df = pd.DataFrame({"ID" : [1, 2, 3],
"age": [46, 48, 55],
"gender": ['female', 'female', 'male']},
index = [0, 1, 2])
CodePudding user response:
What do you want to obtain exactly?
If you want to know per column how many unique values you have, use nunique
:
df.nunique()
ID 3
age 3
gender 2
dtype: int64
If you want to know how many unique rows (considering combinations of columns), use duplicated
:
len(df) - df[['age', 'gender']].duplicated().sum()
# or
(~df.drop(columns='ID').duplicated()).sum()
# or
(~df[['age', 'gender']].duplicated()).sum()
3