Home > Enterprise >  Pandas: select column with most unique values
Pandas: select column with most unique values

Time:11-10

I have a pandas DataFrame and want to find select the column with the most unique values. I already filtered the unique values with nunique(). How can I now choose the column with the highest nunique()?

This is my code so far:

numeric_columns = df.select_dtypes(include = (int or float))
    unique = []
    for column in numeric_columns:
        unique.append(numeric_columns[column].nunique())

I later need to filter all the columns of my dataframe depending on this column(most uniques)

CodePudding user response:

Use DataFrame.select_dtypes with np.number, then get DataFrame.nunique with column by maximal value by Series.idxmax:

df = pd.DataFrame({'a':[1,2,3,4],'b':[1,2,2,2], 'c':list('abcd')})
print (df)
   a  b  c
0  1  1  a
1  2  2  b
2  3  2  c
3  4  2  d

numeric = df.select_dtypes(include = np.number)

nu = numeric.nunique().idxmax()
print (nu)
a
  • Related