creating list of top values in all columns pandas-CodePudding

I am trying to get list of top 2 value counts in all columns in my pandas dataframe. DF is something like this

            column1          column2          column3
 1           apple            red               cat
 2          banana            blue              dog
 3          grapes            yellow            cat
 4           apple            blue              cat
 5          banana            red               tiger
 6          banana            blue              dog

I want the result to be in the form of a list. Something like this:

 ['banana', 'apple', 'blue', 'red', 'cat', 'dog']

can someone please help me with this?

CodePudding user response：

Use Series.value_counts per all columns and filter top values by index with slice (because value_counts sorting values) and then convert values to list:

a = df.apply(lambda x: x.value_counts()[:2].index.tolist()).to_numpy().ravel('F').tolist()
print (a)
['banana', 'apple', 'blue', 'red', 'cat', 'dog']

List comprehension solution with flatten values:

a = [x for c in df.columns for x in df[c].value_counts()[:2].index]
print (a)
['banana', 'apple', 'blue', 'red', 'cat', 'dog']

CodePudding user response：

You can use a simple list comprehension calling value_counts combined with itertools.chain:

from itertools import chain

out = list(chain.from_iterable(df[c].value_counts()[:2].index for c in df))

output: ['banana', 'apple', 'blue', 'red', 'cat', 'dog']