I have a dataframe that looks like below
A B C D E
0 Orange Dad's X Eyes 3d. Navy
1 pink. Mum's Bored. ooo. NaN
2 Yellow NaN Sad Gray NaN
I'm trying to remove punctuations in every column in the dataframe using for loop
import string
string.punctuation
#defining the function to remove punctuation
def remove_punctuation(text):
punctuationfree="".join([i for i in text if i not in string.punctuation])
return punctuationfree
#storing the puntuation free text
col=['Background','Fur','Eyes','Mouth','Clothes','Hat']
for i in col:
df[i].apply(lambda x:remove_punctuation(x))
But I get
"TypeError Traceback (most recent call last)
/var/folders/jd/lln92nb4p01g8grr0000gn/T/ipykernel_24651/2417883.py in <module>
12
13 for i in col:
---> 14 df[i].apply(lambda x:remove_punctuation(x))
TypeError: 'float' object is not iterable"
Can anyone help me on this please? Any help would be greatly appreciated!
CodePudding user response:
I think you might have some float values in your dataframe.
So maybe try to remove them, or in the remove_punctuation
function::
def remove_punctuation(text):
punctuationfree= "".join([i for i in text if i not in string.punctuation]) if isinstance(text, str) else text
return punctuationfree
Which tests if text
is a string
otherwise return it as is
CodePudding user response:
df = pd.DataFrame({'A': ['Orange' , "pink.", "Yellow"],"B":["3d.", "Boared","%hgh&12"]})
for column in df:
df[column]=df[column].str.replace(r'[^\w\s] ', '')
your output will look like this:
A B
0 Orange 3d
1 pink Boared
2 Yellow hgh12
CodePudding user response:
You are getting the error because of NaN
values, try to check for NaN
upfront:
def remove_punctuation(text):
if pd.isna(text):
return text
punctuationfree="".join([i for i in text if i not in string.punctuation])
return punctuationfree
for c in df:
df[c] = df[c].apply(remove_punctuation)
OUTPUT
# df
A B C D E
0 Orange Dads X Eyes 3d Navy
1 pink Mums Bored ooo NaN
2 Yellow NaN Sad Gray NaN