Home > Back-end >  How to use .translate() correctly for the removal of non alphabetical either numerical characters?
How to use .translate() correctly for the removal of non alphabetical either numerical characters?

Time:06-16

I want to remove symbols (Most of them but not all) from my data column 'Review'. A little background on my code:

from pandas.core.frame import DataFrame
# convert to lower case
data['Review'] = data['Review'].str.lower()
# remove trailing white spaces
data['Review'] = data['Review'].str.strip()

This is what I did based on what I read on the internet (I'm still on the beginner-level of NLP, so don't be surprised to find more than one mistake, I just want to know what are they):

import string
sep = '|'
punctuation_chars = '"#$%&\()* ,-./:;<=>?@[\\]^_`{}~'
mapping_table = str.maketrans(dict.fromkeys(punctuation_chars, ''))
 = sep.join(df[df(data['Review']).tolist()]).translate(mapping_table).split(sep)

However, I get the following error:

AttributeError: 'DataFrame' object has no attribute 'tolist'

How could I solve it? I want to use .translate() because I read it's more efficient than other methods.

CodePudding user response:

The AttributeError is caused because DataFrame.tolist() doesn't exist. It looks like the code assumes that df(data['Review']) is a Series, but it is actually a DataFrame.

df = DataFrame(data['Review'])
translated_reviews = sep.join(df[0].tolist()).translate(mapping_table).split(sep)

It's unclear whether data is a DataFrame. If it is, just use it in the join() without calling tolist() or instantiating a new DataFrame.

translated_reviews = sep.join(data['Review']).translate(mapping_table).split(sep)

CodePudding user response:

Your problem was where you were trying to create a dataframe object from a column of your data dataframe and then convert that to list df[df(data['Review']).tolist()] (that part). You can either use df.values.tolist() which would convert the whole dataframe, df, to a list or if you just want to convert a column use data['Review'].tolist()

So in your situation the final line of your code would be switched to

data['Review'] = sep.join(data['Review'].tolist()).translate(mapping_table).split(sep)
  • Related