I am trying to get list of top 2 value counts in all columns in my pandas dataframe. DF is something like this
column1 column2 column3
1 apple red cat
2 banana blue dog
3 grapes yellow cat
4 apple blue cat
5 banana red tiger
6 banana blue dog
I want the result to be in the form of a list. Something like this:
['banana', 'apple', 'blue', 'red', 'cat', 'dog']
can someone please help me with this?
CodePudding user response:
Use Series.value_counts
per all columns and filter top values by index with slice (because value_counts
sorting values) and then convert values to list:
a = df.apply(lambda x: x.value_counts()[:2].index.tolist()).to_numpy().ravel('F').tolist()
print (a)
['banana', 'apple', 'blue', 'red', 'cat', 'dog']
List comprehension solution with flatten values:
a = [x for c in df.columns for x in df[c].value_counts()[:2].index]
print (a)
['banana', 'apple', 'blue', 'red', 'cat', 'dog']
CodePudding user response:
You can use a simple list comprehension calling value_counts
combined with itertools.chain
:
from itertools import chain
out = list(chain.from_iterable(df[c].value_counts()[:2].index for c in df))
output: ['banana', 'apple', 'blue', 'red', 'cat', 'dog']