Home > other >  Why as type('category') is not saving memory in my data frame?
Why as type('category') is not saving memory in my data frame?

Time:12-20

I have a data frame with a column with strings that I want to optimize using 'category'. I am obvisouly doing something wrong as I thought the memory usage is far less with category rather than string.

In [28]: df1.memory_usage()
Out[28]: 
Index          15218784
DATE_CALCUL    15218784
ABN_CONTRAT    15218784
MONTANT_HT     15218784
dtype: int64

In [29]: df1['ABN_CONTRAT'].astype('category').memory_usage()
Out[29]: 28190544

Do you know why ?

CodePudding user response:

Thanks to comment from AKX I answer to the question. Using category allows indeed to save memory usage:

In [10]: df.memory_usage()
Out[10]: 
Index               128
DATE_CALCUL    15490152
ABN_CONTRAT    15490152
MONTANT_HT     15490152
dtype: int64

In [11]: df['ABN_CONTRAT_CAT'] = df['ABN_CONTRAT'].astype('category')

In [12]: df.memory_usage()
Out[12]: 
Index                   128
DATE_CALCUL        15490152
ABN_CONTRAT        15490152
MONTANT_HT         15490152
ABN_CONTRAT_CAT    13107444
dtype: int64
  • Related