Returning numbers equal to or less than 6 in a Python list within a for-CodePudding

I need to return within this FOR only values equal to or less than 6 in each column.

colunas = list(df2.columns[8:19])
colunas

['Satisfação geral',
 'Comunicação',
 'Expertise da industria',
 'Inovação',
 'Parceira',
 'Proatividade',
 'Qualidade',
 'responsividade',
 'Pessoas',
 'Expertise técnico',
 'Pontualidade']

lista = []

for coluna in colunas:
   nome_coluna = coluna
   #total_parcial = df2[coluna].count()
   df2.loc[df2[coluna]<=6].shape[0]
   percentual = df2[coluna].count() / df2[coluna].count()
   lista.append([nome_coluna,total_parcial,percentual])

df_new = pd.DataFrame(data=lista, columns=['nome_coluna','total_parcial','percentual'])

But returns the error

TypeError                                 Traceback (most recent call last)
<ipython-input-120-364994f742fd> in <module>()
      4    nome_coluna = coluna
      5    #total_parcial = df2[coluna].count()
----> 6    df2.loc[df2[coluna]<=6].shape[0]
      7    percentual = df2[coluna].count() / df2[coluna].count()
      8    lista.append([nome_coluna,total_parcial,percentual])

3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in comp_method_OBJECT_ARRAY(op, x, y)
     54         result = libops.vec_compare(x.ravel(), y.ravel(), op)
     55     else:
---> 56         result = libops.scalar_compare(x.ravel(), y, op)
     57     return result.reshape(x.shape)
     58 

pandas/_libs/ops.pyx in pandas._libs.ops.scalar_compare()

TypeError: '<=' not supported between instances of 'str' and 'int'

If I put the code that is giving the error alone in a line it works

df2.loc[df2['Pontualidade'] <= 6].shape[0]

1537

What is the correct syntax? Thanks

CodePudding user response：

One or some of your columns has non-numeric values. If you are sure the columns all should be numeric, use df2[column_name] = pandas.to_numeric(df2[column_name])

to make sure that no number strings, like "123", are mixed in there.

CodePudding user response：

First, your syntax there is correct. The error is related to types. It seems that some of your columns have strings instead of numbers in them, which would cause this error when comparing to a number. You can check the type of the columns with df2.dtypes.

CodePudding user response：

Is it possible that one of the columns you test contains strings instead of numbers? That would explain the thrown error. A good debugging-step would be to print the column-name at the beginning of the loop to see in which iteration it fails.

CodePudding user response：

One of your DataFrame's columns contains strings, rather than numbers. If every column is supposed to be numeric, you can cast the rows to numbers by adding .astype(float) to the left side of the comparison, ie,

df2.loc[df2[coluna].astype(float)<=6].shape[0]
# Will return the number of rows with values less than or equal to 6

You could also use .astype(int) if they should be integers. Note that either will still raise an error if your column contains values that can't be cast to numbers. And regardless, it is probably better to find out why a column you expect to be numeric isn't earlier in your code.

As an aside, as the comparison will return as series of booleans, you can simplify and clarify the code by simply taking a sum of the booleans, ie

(df2[coluna].astype(float)<=6).sum()
# Will also return the number of rows with values less than or equal to 6