I have a dataframe which has a column name called "drug_name" and I would like to get a list of all the unique values in the column and the number of time it has occured. For this reason , I use
print(df['drug_name'].value_counts())
and
pd.value_counts(df.drug_name)
Both of these work fine but the length is very long since there is many variables that occur once. So I would like to know if there is a parameter that allows me to set the number of occurences to more than 100 to reduce the length and see only the relevant variables.
CodePudding user response:
You can select the values afterwards:
s = df['drug_name'].value_counts()
s[s.ge(100]
Alternative, as value_counts
is sorted by decreasing count, you can only look at the top ones:
df['drug_name'].value_counts().head(20) # 20 top items
CodePudding user response:
This would solve the problem.
import pandas as pd
# sample dict with repeated items
d = {'drug_name':['hello', 'hello', 'hello', 'hello', 'bye', 'bye']}
df = pd.DataFrame(d)
print(df)
print()
# this gets the unique values with their respective frequency
df_counted = df['drug_name'].value_counts()
print(df_counted)
print()
# filter to values > 3
df_filtered = df_counted[df_counted>2]
print(df_filtered)
this is the sample dataframe:
drug_name
0 hello
1 hello
2 hello
3 bye
4 bye
These are the unique values counted:
hello 4
bye 2
These are the unique values > n:
hello 4