Home > Blockchain >  Comparing every row with all other rows with pandas
Comparing every row with all other rows with pandas

Time:05-17

my goal is to compare every row with all other rows to see how many rows are unique regarding their entries. I am quite new to pandas so I am at a loss. An exemplary dataframe would be as follows:

df = pd.DataFrame({"ID" : [1, 2, 3], 
                   "age": [46, 48, 55],
                   "gender": ['female', 'female', 'male']},
                   index = [0, 1, 2]) 

CodePudding user response:

What do you want to obtain exactly?

If you want to know per column how many unique values you have, use nunique:

df.nunique()

ID        3
age       3
gender    2
dtype: int64

If you want to know how many unique rows (considering combinations of columns), use duplicated:

len(df) - df[['age', 'gender']].duplicated().sum()

# or 
(~df.drop(columns='ID').duplicated()).sum()

# or
(~df[['age', 'gender']].duplicated()).sum()

3
  • Related