Home > Back-end >  Reducing dtypes to save memory
Reducing dtypes to save memory

Time:12-03

In order the reduce the amount of memory a dataframe takes, I have written the following function which is converting to the lowest possible int/float.

from pandas.api.types import is_numeric_dtype
def chng_dtypes(df):
    has_decimal = 0
    for col in df.columns:
        if (is_numeric_dtype(df[col])):
            col_min = df[col].min()
            col_max = df[col].max()
            bytes = 64
            if ((col_min > -2147483648) & (col_max < 2147483648)):
                bytes = 32
            if ((col_min > -32768) & (col_max < 32768)):
                bytes = 16        
            if ((col_min > -128) & (col_max < 128)):
                bytes = 8
            if ( any(df[col]%1!=0) ):
                has_decimal == 1
                if (bytes == 8):
                    bytes = 16
                type = 'float' str(bytes)
            else:
                type = 'int' str(bytes)
            df[col] = df[col].astype(type)

Is there a more efficient way to do this?

CodePudding user response:

pd.to_numeric can help:

Suppose this dataframe:

df = df = pd.DataFrame({'A': [2147483648-1, -2147483648],
                        'B': [32768-1, -32768],
                        'C': [128-1, -128],
                        'D': [128.-1, -128.],
                        'E': [65536.5, -65536.5]})
print(df.dtypes)

# Output:
A    int64
B    int64
C    int64
D    float64
E    float64
dtype: object
for col in df.select_dtypes('number'):
    df[col] = pd.to_numeric(df[col], downcast='integer')
    if df[col].dtype == 'float':
        df[col] = pd.to_numeric(df[col], downcast='float')
print(df.dtypes)

# Output:
A      int32
B      int16
C       int8
D       int8
E    float32
dtype: object
  • Related