Can't make the discretization of data in python because of column name?-CodePudding

I have this dataset with many categorical columns and I have to make the discretization of the data.

First I upload the data using Pandas and it gives me this:

X = pd.read_excel("/content/drive/MyDrive/APR NÃO SUP_Tarefa_Trilha 4 (2) (1).ods")
X.head()

After that I trie to make de discretization of data using this block of code:

coluna = ["LEG","GRANGE_REG", "SIGLA_UF", "NOME", "TIPO", "CAT_ASSOC", "NOME_MUN", "LEG"]
for col in coluna:
  classes = np.unique(X[col])
  number = 0 # valor que será usado para representar a clases
  for i in classes:
    X = X.replace(i, number)
    number = number   1
  print('Novos dados:')
  print(X[col])

And this code gives this error:

<ipython-input-72-c6cc213e95a5> in <module>()
      3 for col in coluna:
      4   print(col)
----> 5   classes = np.unique(X[col])
      6   number = 0 # valor que será usado para representar a clases
      7   for i in classes:

/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   2904             if self.columns.nlevels > 1:
   2905                 return self._getitem_multilevel(key)
-> 2906             indexer = self.columns.get_loc(key)
   2907             if is_integer(indexer):
   2908                 indexer = [indexer]

/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2898                 return self._engine.get_loc(casted_key)
   2899             except KeyError as err:
-> 2900                 raise KeyError(key) from err
   2901 
   2902         if tolerance is not None:

KeyError: 'GRANGE_REG'

P.S.: The col "LEG" works without a problem, the error only pops when the variable col changes to "CHANGE_REG"

P.P.S.: Sorry for bad english

CodePudding user response：

It's just a typo. You've written "GRANGE_REG" instead of "GRANDE_REG".

CodePudding user response：

It seems like there's a difference between what you think the column name is and what it is in the .ods file (I'm not familiar with .ods files). There might be a missing space or something. Can you try:

print(X.columns)

That should tell you what the column name strings are in the X dataframe.

Edit: Looking closer at the image, I see that it's "GRANDE_REG" in the dataframe, but you are looking for "GRANGE_REG" (i.e. "D" swapped for "G").