How to remove all special characters and letters from column in DataFrame in Python Pandas?-CodePudding

I have DataFrame like below in Python Pandas ("col1" is data type string):

col1
-----
1234AABY332
857363opx00C* 
9994TyF@@@!
...

And I need to remove all special characters like: ["-", ",", ".", ":", "/", "@", "#", "&", "$", "%", " ", "*", "(", ")", "=", "!", "", "~", "~"] and letters (both large and small) like for example: A, a, b, c and so one...

so as a result I need DataFrame like below:

col1
-----
1234332
85736300
9994
...

How can I do that in Python Pandas ?

CodePudding user response：

I might phrase your requirement as removing all non digit characters:

df["col1"] = df["col1"].str.replace(r'\D ', '', regex=True)

CodePudding user response：

You can also use findall to extract digit only:

df['col1'] = df['col1'].str.findall(r'(\d)').str.join('')
print(df)

# Output
       col1
0   1234332
1  85736300
2      9994

You can append .astype(int) to convert digits to a number: