Consider I have a column in python pandas and have 1000 string values, how can I select top 10 out of this, based on how many times it repeat
data['country_state'] = data['place'].str.rsplit(',').str[-1] #column
country_state
has 1000 values I have to select top 10 country_state
out of 1000 based on how many times the same string repeats
CodePudding user response:
I think a combination of value_counts (https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html) and nlargest (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.nlargest.html) should work here:
data['country_state'].value_counts().nlargest(10)
CodePudding user response:
Hi you can use some pandas functions to solve this problem, first value_counts
will sort your data by repetitions and count that and then you can split the first 10 and get their index. Here an example:
import numpy as np
import pandas as pd
#create the dataframe I used numbers for simplicity it's the same for other var
n = np.random.randint(0,50,1000)
df_n = pd.DataFrame(n,columns= ['num'])
#get values by frequency
nreps = df_n['num'].value_counts()
#get the top ten and print it's index
top10_values = nreps.iloc[:10].index
top10_counts = nreps.iloc[:10].values