In order the reduce the amount of memory a dataframe takes, I have written the following function which is converting to the lowest possible int/float.
from pandas.api.types import is_numeric_dtype
def chng_dtypes(df):
has_decimal = 0
for col in df.columns:
if (is_numeric_dtype(df[col])):
col_min = df[col].min()
col_max = df[col].max()
bytes = 64
if ((col_min > -2147483648) & (col_max < 2147483648)):
bytes = 32
if ((col_min > -32768) & (col_max < 32768)):
bytes = 16
if ((col_min > -128) & (col_max < 128)):
bytes = 8
if ( any(df[col]%1!=0) ):
has_decimal == 1
if (bytes == 8):
bytes = 16
type = 'float' str(bytes)
else:
type = 'int' str(bytes)
df[col] = df[col].astype(type)
Is there a more efficient way to do this?
CodePudding user response:
pd.to_numeric
can help:
Suppose this dataframe:
df = df = pd.DataFrame({'A': [2147483648-1, -2147483648],
'B': [32768-1, -32768],
'C': [128-1, -128],
'D': [128.-1, -128.],
'E': [65536.5, -65536.5]})
print(df.dtypes)
# Output:
A int64
B int64
C int64
D float64
E float64
dtype: object
for col in df.select_dtypes('number'):
df[col] = pd.to_numeric(df[col], downcast='integer')
if df[col].dtype == 'float':
df[col] = pd.to_numeric(df[col], downcast='float')
print(df.dtypes)
# Output:
A int32
B int16
C int8
D int8
E float32
dtype: object