Iteration by capturing the top 5 values in name

I created a list with the columns of my datraframe:

colunas = list(df.columns[9:19])
colunas

['Comunicação',
 'Expertise da industria',
 'Inovação',
 'Parceira',
 'Proatividade',
 'Qualidade',
 'responsividade',
 'Pessoas',
 'Expertise técnico',
 'Pontualidade']

Here is part of my dataframe with its columns:

        Company     name_column total_parcial   percentual
0       Company10   Comunicação  6658           22.73
1       Company10   Expertise   10049           34.30
2       Company10   Inovação      801            2.73
3       Company10   Parceira     1316            4.49
4       Company10   Proatividade 5589           19.08
... ... ... ... ...
35275   Company999  Qualidade       9102        31.07
35276   Company999  responsividade  8374        28.58
35277   Company999  Pessoas        23949        81.75
35278   Company999  Expertise       9925        33.88
35279   Company999  Pontualidade    9250        31.57
35280 rows × 4 columns

I need to create a new dataframe with the top 5 percentage values that are in each name_column. The output should look like this:

        Company     name_column     total_parcial   percentual
6097    Company1549 Pessoas         23949           81.75
10067   Company1908 Pessoas         23949           81.72
29527   Company48   Pessoas         23949           81.50
4387    Company1395 Pessoas         23949           81.33
13987   Company2262 Pessoas         23949           81.12
... ... ... ... ...
10672   Company1963 Inovação          801            72.73
5232    Company1471 Inovação          801            72.65
10682   Company1964 Inovação          801            72.60
32292   Company729  Inovação          801            72.51
24362   Company3204 Inovação          801            72.13

I created this code iteratedly but it didn't work:

lista4 = []

for coluna in df_company_top_percent[colunas]:
  x = df_company_top_percent.nlargest(5,coluna)
  lista4.append([coluna,x])

df_company_top_percent is where am i going to create the new dataframe. And returns the error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-32-4f5b4acd3541> in <module>()
      1 lista4 = []
      2 
----> 3 for coluna in df_empresas_melhores_percent[colunas]:
      4   x = df_empresas_melhores_percent.nlargest(5,coluna)
      5   lista4.append([coluna,x])

2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Index(['Comunicação', 'Expertise da industria', 'Inovação', 'Parceira',\n       'Proatividade', 'Qualidade', 'responsividade', 'Pessoas',\n       'Expertise técnico', 'Pontualidade'],\n      dtype='object')] are in the [columns]"

How can i fix it?
Thanks

CodePudding user response：

I think what you want is

top_percent = (
    df.groupby('name_column', group_keys=False)      # for each 'name_column'
      .apply(lambda g: g.nlargest(5, 'percentual'))  # get the 5 rows with the 
)                                                    # highest 'percentual' values

CodePudding user response：

df_company_top_percent does not have the columns you are looking for (colunas)

Not sure I understand what you want as a result, but if you want df_company_top_percent to be the result, initialize it first as an empty dataframe, then append to it.

df_company_top_percent=pd.DataFrame([])

for coluna in colunas:
  x = df.nlargest(5,coluna)[coluna]
  df_company_top_percent=df_company_top_percent.append(x)