Home > Mobile >  How to remove all special characters and letters from column in DataFrame in Python Pandas?
How to remove all special characters and letters from column in DataFrame in Python Pandas?

Time:07-12

I have DataFrame like below in Python Pandas ("col1" is data type string):

col1
-----
1234AABY332
857363opx00C* 
9994TyF@@@!
...

And I need to remove all special characters like: ["-", ",", ".", ":", "/", "@", "#", "&", "$", "%", " ", "*", "(", ")", "=", "!", "", "~", "~"] and letters (both large and small) like for example: A, a, b, c and so one...

so as a result I need DataFrame like below:

col1
-----
1234332
85736300
9994
...

How can I do that in Python Pandas ?

CodePudding user response:

I might phrase your requirement as removing all non digit characters:

df["col1"] = df["col1"].str.replace(r'\D ', '', regex=True)

CodePudding user response:

You can also use findall to extract digit only:

df['col1'] = df['col1'].str.findall(r'(\d)').str.join('')
print(df)

# Output
       col1
0   1234332
1  85736300
2      9994

You can append .astype(int) to convert digits to a number:

  • Related