Home > Software design >  Selecting columns based on how many times it repeats
Selecting columns based on how many times it repeats

Time:11-03

Consider I have a column in python pandas and have 1000 string values, how can I select top 10 out of this, based on how many times it repeat

data['country_state'] = data['place'].str.rsplit(',').str[-1] #column 

country_state has 1000 values I have to select top 10 country_state out of 1000 based on how many times the same string repeats

CodePudding user response:

I think a combination of value_counts (https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html) and nlargest (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.nlargest.html) should work here:

data['country_state'].value_counts().nlargest(10)

CodePudding user response:

Hi you can use some pandas functions to solve this problem, first value_counts will sort your data by repetitions and count that and then you can split the first 10 and get their index. Here an example:

import numpy as np
import pandas as pd

#create the dataframe I used numbers for simplicity it's the same for other var
n = np.random.randint(0,50,1000)
df_n = pd.DataFrame(n,columns= ['num'])

#get values by frequency 
nreps = df_n['num'].value_counts()

#get the top ten and print it's index
top10_values = nreps.iloc[:10].index
top10_counts    = nreps.iloc[:10].values
  • Related