INPUT
A B C
0 1 2 3
1 4 ? 6
2 7 8 ?
... ... ... ...
551 4 4 6
552 3 7 9
There might be '?' in between somewhere which is undetectable, I tried doing it with
pd.to_numeric, error='coerce'
but it only show first 5 and last 5 rows, and I cant check all rows/columns for special chars
So how to actually deal with this problem and make dataset clean
Once detected I know how to remove those and fill with their respective column mean values, so thats not an issue
Please I'm new to this stack overflow and switching from a non-IT field
CodePudding user response:
The below is an easier way without using regex.
special = '[@_!#$%^&*()<>?/\|}{~:]'
df['B'].str.count(special)
Please refer to below link to do it using regex:
CodePudding user response:
df.replace('\*|\&|\?', 'None', regex=True)