Pandas - get columns where all values are unique (distinct)-CodePudding

I have a dataframe with many column and I am trying to get the columns where all values are unique (distinct).

I was able to to this for columns without missing values:

df.columns[df.nunique(dropna=False) == len(df)]

But I can't find a simple solution for columns with NaNs

CodePudding user response：

This will print all columns that contains unique values excluding NaN

[col for col in df.columns if df[col].dropna().is_unique ]

Here is another one liner solution without using loop

df.columns[df.apply(lambda x : x.dropna().is_unique, axis=0)]

To get it in an array form you can use

df.columns[df.apply(lambda x : x.dropna().is_unique, axis=0)].array

CodePudding user response：

You can get the column names where all values are unique as follows

all_unique = [item for item in df.columns if len(df) == len(df[item].unique())]

Then you can simply call them as usual to get the column with only unique values

df[all_unique]

CodePudding user response：

`nunique` and `count`

df.columns[df.nunique() == df.count()]