Home > Enterprise >  Is there a way to detect special chars such as '?' or any, in a column in huge dataframe w
Is there a way to detect special chars such as '?' or any, in a column in huge dataframe w

Time:04-11

INPUT

     A   B   C
0    1   2   3
1    4   ?   6
2    7   8   ?
...  ... ... ...
551  4   4   6
552  3   7   9

There might be '?' in between somewhere which is undetectable, I tried doing it with

pd.to_numeric, error='coerce'

but it only show first 5 and last 5 rows, and I cant check all rows/columns for special chars

So how to actually deal with this problem and make dataset clean

Once detected I know how to remove those and fill with their respective column mean values, so thats not an issue

Please I'm new to this stack overflow and switching from a non-IT field

CodePudding user response:

The below is an easier way without using regex.

special = '[@_!#$%^&*()<>?/\|}{~:]'
df['B'].str.count(special)

Please refer to below link to do it using regex:

regex

CodePudding user response:

df.replace('\*|\&|\?', 'None', regex=True)
  • Related