Count of unique values that occur more than 100 in a data frame-CodePudding

I have a dataframe which has a column name called "drug_name" and I would like to get a list of all the unique values in the column and the number of time it has occured. For this reason , I use

print(df['drug_name'].value_counts())

and

pd.value_counts(df.drug_name)

Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.

CodePudding user response：

You can select the values afterwards:

s = df['drug_name'].value_counts()
s[s.ge(100]

Alternative, as value_counts is sorted by decreasing count, you can only look at the top ones:

df['drug_name'].value_counts().head(20) # 20 top items

CodePudding user response：

This would solve the problem.


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()


# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)

this is the sample dataframe:

  drug_name
0     hello
1     hello
2     hello
3       bye
4       bye

These are the unique values counted:

hello    4
bye      2

These are the unique values > n:

hello    4