how do i convert a number dataframe column in millions to double or float
CodePudding user response:
You will need to remove the full stops. You can use pandas replace method then convert it into a float:
df['col'] = df['col'].replace('\.', '', regex=True).astype('float')
Example
>>> df = pd.DataFrame({'A': ['1.1.1', '2.1.2', '3.1.3', '4.1.4']})
>>> df
A
0 1.1.1
1 2.1.2
2 3.1.3
3 4.1.4
>>> df['col'] = df['col'].replace('\.', '', regex=True).astype('float')
>>> df['A']
A
0 111.0
1 212.0
2 313.0
3 414.0
>>> df['A'].dtype
float64
I'm assuming that because you have two full stops that the data is of type string. However, this should work even if you have some integers or floats in that column as well.
CodePudding user response:
my_col_name
0 1.598.248
1 1.323.373
2 1.628.781
3 1.551.707
4 1.790.930
5 1.877.985
6 1.484.103
With the df
above, you can try below code, with 3 steps: (1) change column type to string, (2) do string replace character, (3) change column type to float
col = 'my_col_name'
df[col] = df[col].astype('str')
df[col] = df[col].str.replace('.','')
df[col] = df[col].astype('float')
print(df)
Please note the above will result in a warning: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will *not* be treated as literal strings when regex=True.
So you could use below code with regex=True
, also, I've combined in 1 line:
df[col] = df[col].astype('str').str.replace('.','', regex=True).astype('float')
print(df)
Output
my_col_name
0 15982480.0
1 13233730.0
2 16287810.0
3 15517070.0
4 17909300.0
5 18779850.0
6 14841030.0