I created a list with the columns of my datraframe:
colunas = list(df.columns[9:19])
colunas
['Comunicação',
'Expertise da industria',
'Inovação',
'Parceira',
'Proatividade',
'Qualidade',
'responsividade',
'Pessoas',
'Expertise técnico',
'Pontualidade']
Here is part of my dataframe with its columns:
Company name_column total_parcial percentual
0 Company10 Comunicação 6658 22.73
1 Company10 Expertise 10049 34.30
2 Company10 Inovação 801 2.73
3 Company10 Parceira 1316 4.49
4 Company10 Proatividade 5589 19.08
... ... ... ... ...
35275 Company999 Qualidade 9102 31.07
35276 Company999 responsividade 8374 28.58
35277 Company999 Pessoas 23949 81.75
35278 Company999 Expertise 9925 33.88
35279 Company999 Pontualidade 9250 31.57
35280 rows × 4 columns
I need to create a new dataframe with the top 5 percentage values that are in each name_column. The output should look like this:
Company name_column total_parcial percentual
6097 Company1549 Pessoas 23949 81.75
10067 Company1908 Pessoas 23949 81.72
29527 Company48 Pessoas 23949 81.50
4387 Company1395 Pessoas 23949 81.33
13987 Company2262 Pessoas 23949 81.12
... ... ... ... ...
10672 Company1963 Inovação 801 72.73
5232 Company1471 Inovação 801 72.65
10682 Company1964 Inovação 801 72.60
32292 Company729 Inovação 801 72.51
24362 Company3204 Inovação 801 72.13
I created this code iteratedly but it didn't work:
lista4 = []
for coluna in df_company_top_percent[colunas]:
x = df_company_top_percent.nlargest(5,coluna)
lista4.append([coluna,x])
df_company_top_percent is where am i going to create the new dataframe. And returns the error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-32-4f5b4acd3541> in <module>()
1 lista4 = []
2
----> 3 for coluna in df_empresas_melhores_percent[colunas]:
4 x = df_empresas_melhores_percent.nlargest(5,coluna)
5 lista4.append([coluna,x])
2 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Index(['Comunicação', 'Expertise da industria', 'Inovação', 'Parceira',\n 'Proatividade', 'Qualidade', 'responsividade', 'Pessoas',\n 'Expertise técnico', 'Pontualidade'],\n dtype='object')] are in the [columns]"
How can i fix it?
Thanks
CodePudding user response:
I think what you want is
top_percent = (
df.groupby('name_column', group_keys=False) # for each 'name_column'
.apply(lambda g: g.nlargest(5, 'percentual')) # get the 5 rows with the
) # highest 'percentual' values
CodePudding user response:
df_company_top_percent
does not have the columns you are looking for (colunas
)
Not sure I understand what you want as a result, but if you want df_company_top_percent
to be the result, initialize it first as an empty dataframe, then append to it.
df_company_top_percent=pd.DataFrame([])
for coluna in colunas:
x = df.nlargest(5,coluna)[coluna]
df_company_top_percent=df_company_top_percent.append(x)