I have DataFrame like below in Python Pandas ("col1" is data type string):
col1
-----
1234AABY332
857363opx00C*
9994TyF@@@!
...
And I need to remove all special characters like: ["-", ",", ".", ":", "/", "@", "#", "&", "$", "%", " ", "*", "(", ")", "=", "!", "
", "~", "~"]
and letters (both large and small) like for example: A, a, b, c and so one...
so as a result I need DataFrame like below:
col1
-----
1234332
85736300
9994
...
How can I do that in Python Pandas ?
CodePudding user response:
I might phrase your requirement as removing all non digit characters:
df["col1"] = df["col1"].str.replace(r'\D ', '', regex=True)
CodePudding user response:
You can also use findall
to extract digit only:
df['col1'] = df['col1'].str.findall(r'(\d)').str.join('')
print(df)
# Output
col1
0 1234332
1 85736300
2 9994
You can append .astype(int)
to convert digits to a number: