Home > Software engineering >  Pandas - get columns where all values are unique (distinct)
Pandas - get columns where all values are unique (distinct)

Time:08-07

I have a dataframe with many column and I am trying to get the columns where all values are unique (distinct).

I was able to to this for columns without missing values:

df.columns[df.nunique(dropna=False) == len(df)]

But I can't find a simple solution for columns with NaNs

CodePudding user response:

This will print all columns that contains unique values excluding NaN

[col for col in df.columns if df[col].dropna().is_unique ]

Here is another one liner solution without using loop

df.columns[df.apply(lambda x : x.dropna().is_unique, axis=0)]

To get it in an array form you can use

df.columns[df.apply(lambda x : x.dropna().is_unique, axis=0)].array

CodePudding user response:

You can get the column names where all values are unique as follows

all_unique = [item for item in df.columns if len(df) == len(df[item].unique())]

Then you can simply call them as usual to get the column with only unique values

df[all_unique]

CodePudding user response:

nunique and count

df.columns[df.nunique() == df.count()]

How this works?

  • nunique counts the unique values (excluding NaN's)
  • count as the name suggests counts all the values (excluding NaN's)
  • Related