One Column of my dataset is like this:
0 10,000
1 500,000
2 5,000,000
3 50,000,000
4 100,000
Name: Installs, dtype: object
and I want to change these 'xxx,yyy,zzz ' strings to integers. first I tried this function:
df['Installs'] = pd.to_numeric(df['Installs'])
and I got this error:
ValueError: Unable to parse string "10,000" at position 0
and then I tried to remove ' ' and ',' with this method:
df['Installs'] = df['Installs'].str.replace(' ','',regex = True)
df['Installs'] = df['Installs'].str.replace(',','',regex = True)
but nothing changed!
How can I convert these strings to integers?
CodePudding user response:
With regex=True
, the
(plus) character is interepreted specially, as a regex feature. You can either disable regular expression replacement (regex=False
), or even better, change your regular expression to match
or ,
and remove them at once:
df['Installs'] = df['Installs'].str.replace('[ ,]', '', regex=True).astype(int)
Output:
>>> df['Installs']
0 10000
1 500000
2 5000000
3 50000000
4 100000
Name: 0, dtype: int64
CodePudding user response:
is not a valid regex, use:
df['Installs'] = pd.to_numeric(df['Installs'].str.replace(r'\D', '', regex=True))