how to use list comprehension to subset the dataframe with the valuecounts-CodePudding

make     year
honda    2011
honda    2011
honda    n/a
toyota   2011
toyota   2022

Im trying to get list of the make that has value counts more than 2 below is code:

list = [I for I in df.make.unique() if df.loc[df.make==I, 'make'].value_counts()>2]

for some reason I get following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

CodePudding user response：

vc = df['make'].value_counts()
vc[vc>2].index.to_list()

o/p:

['honda']

as for your error:

[I for I in df.make.unique() if (df.loc[df.make==I, 'make'].value_counts()>2).values[0]]

CodePudding user response：

count is enough

lst = [I for I in df.make.unique() if df.loc[df.make==I, 'make'].count()>2]

You can also use

lst = df.value_counts('make')[df.value_counts('make')>2].index.tolist()

print(lst)
['honda']

CodePudding user response：

here is another way to do it

df = data.groupby("make")['make'].count().to_frame(name='cnt').reset_index()
df[df.cnt > 2]['make'].to_list()

returning a list

['honda']