Home > Net >  Pandas : Changing a column of dataset from string to integer
Pandas : Changing a column of dataset from string to integer

Time:12-07

One Column of my dataset is like this:

0        10,000 
1       500,000 
2     5,000,000 
3    50,000,000 
4       100,000 
Name: Installs, dtype: object

and I want to change these 'xxx,yyy,zzz ' strings to integers. first I tried this function:

df['Installs'] = pd.to_numeric(df['Installs'])

and I got this error:

ValueError: Unable to parse string "10,000" at position 0

and then I tried to remove ' ' and ',' with this method:

df['Installs'] = df['Installs'].str.replace(' ','',regex = True)
df['Installs'] = df['Installs'].str.replace(',','',regex = True)

but nothing changed!

How can I convert these strings to integers?

CodePudding user response:

With regex=True, the (plus) character is interepreted specially, as a regex feature. You can either disable regular expression replacement (regex=False), or even better, change your regular expression to match or , and remove them at once:

df['Installs'] = df['Installs'].str.replace('[ ,]', '', regex=True).astype(int)

Output:

>>> df['Installs']
0       10000
1      500000
2     5000000
3    50000000
4      100000
Name: 0, dtype: int64

CodePudding user response:

is not a valid regex, use:

df['Installs'] = pd.to_numeric(df['Installs'].str.replace(r'\D', '', regex=True))
  • Related