I'd like to use dtype='float32'
(it is probably a numpy dtype => np.float32
) instead of dtype='float64'
to reduce memory usage of my pandas dataframe, because I have to handle hugh pandas dataframes.
At one point, I'd like to extract a python list with '.to_dict(orient='records')'
in order to get a dictionary for each row.
In this case, I will get additional decimal places, which are probably based on s.th like this:
Is floating point math broken?
How can I cast the date / change the type etc. in order to get the same result, as I get with float64
(see example snippets)?
import pandas as pd
_data = {'col1': [1.45123, 1.64123], 'col2': [0.1, 0.2]}
_test = pd.DataFrame(_data).astype(dtype='float64')
print(f"{_test=}")
print(f"{_test.round(1)=}")
print(f"{_test.to_dict(orient='records')=}")
print(f"{_test.round(1).to_dict(orient='records')=}")
float64
output:
_test= col1 col2
0 1.45123 0.1
1 1.64123 0.2
_test.round(1)= col1 col2
0 1.5 0.1
1 1.6 0.2
_test.to_dict(orient='records')=[{'col1': 1.45123, 'col2': 0.1}, {'col1': 1.64123, 'col2': 0.2}]
_test.round(1).to_dict(orient='records')=[{'col1': 1.5, 'col2': 0.1}, {'col1': 1.6, 'col2': 0.2}]
import pandas as pd
_data = {'col1': [1.45123, 1.64123], 'col2': [0.1, 0.2]}
_test = pd.DataFrame(_data).astype(dtype='float32')
print(f"{_test=}")
print(f"{_test.round(1)=}")
print(f"{_test.to_dict(orient='records')=}")
print(f"{_test.round(1).to_dict(orient='records')=}")
float32
output:
_test= col1 col2
0 1.45123 0.1
1 1.64123 0.2
_test.round(1)= col1 col2
0 1.5 0.1
1 1.6 0.2
_test.to_dict(orient='records')=[{'col1': 1.4512300491333008, 'col2': 0.10000000149011612}, {'col1': 1.6412299871444702, 'col2': 0.20000000298023224}]
_test.round(1).to_dict(orient='records')=[{'col1': 1.5, 'col2': 0.10000000149011612}, {'col1': 1.600000023841858, 'col2': 0.20000000298023224}]
CodePudding user response:
Managing float representation has some limitation for example this
Using to_dict() function switch from numpy representation to python native float representation, this means a sort of translation. Nevertheless the precision you are using, some small pieces of information will be lost.
For a no-lossy convertion you must cast your number to string before the to_dict() using the as_type() function:
_data = {'col1': [1.45123, 1.64123], 'col2': [0.1, 0.2]}
_test = pd.DataFrame(_data).astype(dtype='float32')
_test.round(1).astype('str').to_dict(orient='records')
_test.round(1).astype('str').to_dict(orient='records')=[{'col1': '1.5', 'col2': '0.1'}, {'col1': '1.6', 'col2': '0.2'}]
An alternative can be the decimal format.