I want to eliminate all the digit, whitespace and parenthesis in the column of dataframe. So, I wrote the following code.
df_energy['Country'].replace(r'\d* \(.*\)','',regex=True,inplace=True)
However, it only eliminate whitespace and parenthesis.
{'China2',''China, Hong Kong Special Administrative Region3'}
Items with digit at the end still remains the same. May I know which part of the statement I miswrote.
CodePudding user response:
Your regex specifically looks for the pattern:
meaning that an input like 1 (string)
would match.
If you want to eliminate the charactes in the input regardless of order, I would suggest something like a simple list of characters to look for: r'[\d ()] '
The screenshots come from CyrilEx https://extendsclass.com/ that can visualize regex and help you debug patterns