Home > OS >  Removing punctuations in dataframe using for loop
Removing punctuations in dataframe using for loop

Time:09-20

I have a dataframe that looks like below

  A        B       C       D      E
0 Orange  Dad's  X Eyes   3d.    Navy
1 pink.   Mum's  Bored.   ooo.   NaN
2 Yellow  NaN    Sad      Gray   NaN

I'm trying to remove punctuations in every column in the dataframe using for loop

import string
string.punctuation

#defining the function to remove punctuation
def remove_punctuation(text):
    punctuationfree="".join([i for i in text if i not in string.punctuation])
    return punctuationfree

#storing the puntuation free text
col=['Background','Fur','Eyes','Mouth','Clothes','Hat']

for i in col:
    df[i].apply(lambda x:remove_punctuation(x))

But I get

    "TypeError                                 Traceback (most recent call last)
    /var/folders/jd/lln92nb4p01g8grr0000gn/T/ipykernel_24651/2417883.py in <module>
         12 
         13 for i in col:
    ---> 14     df[i].apply(lambda x:remove_punctuation(x))
      
TypeError: 'float' object is not iterable" 

Can anyone help me on this please? Any help would be greatly appreciated!

CodePudding user response:

I think you might have some float values in your dataframe.

So maybe try to remove them, or in the remove_punctuation function::

def remove_punctuation(text):
    punctuationfree= "".join([i for i in text if i not in string.punctuation]) if isinstance(text, str) else text
    return punctuationfree

Which tests if text is a string otherwise return it as is

CodePudding user response:

df = pd.DataFrame({'A': ['Orange' , "pink.",  "Yellow"],"B":["3d.", "Boared","%hgh&12"]})

for column in df:
     df[column]=df[column].str.replace(r'[^\w\s] ', '')

your output will look like this:

       A        B
0     Orange    3d
1     pink   Boared
2    Yellow  hgh12

CodePudding user response:

You are getting the error because of NaN values, try to check for NaN upfront:

def remove_punctuation(text):
    if pd.isna(text):
        return text
    punctuationfree="".join([i for i in text if i not in string.punctuation])
    return punctuationfree

for c in df:
    df[c] = df[c].apply(remove_punctuation)

OUTPUT

# df
          A     B       C     D     E
0   Orange   Dads  X Eyes    3d  Navy
1     pink   Mums   Bored   ooo   NaN
2   Yellow   NaN     Sad  Gray   NaN
  • Related