Home > Software engineering >  Count of unique values that occur more than 100 in a data frame
Count of unique values that occur more than 100 in a data frame

Time:03-24

I have a dataframe which has a column name called "drug_name" and I would like to get a list of all the unique values in the column and the number of time it has occured. For this reason , I use

print(df['drug_name'].value_counts()) 

and

pd.value_counts(df.drug_name)

Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.

CodePudding user response:

You can select the values afterwards:

s = df['drug_name'].value_counts()
s[s.ge(100]

Alternative, as value_counts is sorted by decreasing count, you can only look at the top ones:

df['drug_name'].value_counts().head(20) # 20 top items

CodePudding user response:

This would solve the problem.


import pandas as pd

# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()

# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()


# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)


this is the sample dataframe:

  drug_name
0     hello
1     hello
2     hello
3       bye
4       bye

These are the unique values counted:

hello    4
bye      2

These are the unique values > n:

hello    4
  • Related