I have a dataframe with many column and I am trying to get the columns where all values are unique (distinct).
I was able to to this for columns without missing values:
df.columns[df.nunique(dropna=False) == len(df)]
But I can't find a simple solution for columns with NaNs
CodePudding user response:
This will print all columns that contains unique values excluding NaN
[col for col in df.columns if df[col].dropna().is_unique ]
Here is another one liner solution without using loop
df.columns[df.apply(lambda x : x.dropna().is_unique, axis=0)]
To get it in an array form you can use
df.columns[df.apply(lambda x : x.dropna().is_unique, axis=0)].array
CodePudding user response:
You can get the column names where all values are unique as follows
all_unique = [item for item in df.columns if len(df) == len(df[item].unique())]
Then you can simply call them as usual to get the column with only unique values
df[all_unique]
CodePudding user response:
nunique
and count
df.columns[df.nunique() == df.count()]
How this works?
nunique
counts theunique
values (excluding NaN's)count
as the name suggests counts all the values (excluding NaN's)