I have a data frame with a column with strings that I want to optimize using 'category'. I am obvisouly doing something wrong as I thought the memory usage is far less with category rather than string.
In [28]: df1.memory_usage()
Out[28]:
Index 15218784
DATE_CALCUL 15218784
ABN_CONTRAT 15218784
MONTANT_HT 15218784
dtype: int64
In [29]: df1['ABN_CONTRAT'].astype('category').memory_usage()
Out[29]: 28190544
Do you know why ?
CodePudding user response:
Thanks to comment from AKX I answer to the question. Using category allows indeed to save memory usage:
In [10]: df.memory_usage()
Out[10]:
Index 128
DATE_CALCUL 15490152
ABN_CONTRAT 15490152
MONTANT_HT 15490152
dtype: int64
In [11]: df['ABN_CONTRAT_CAT'] = df['ABN_CONTRAT'].astype('category')
In [12]: df.memory_usage()
Out[12]:
Index 128
DATE_CALCUL 15490152
ABN_CONTRAT 15490152
MONTANT_HT 15490152
ABN_CONTRAT_CAT 13107444
dtype: int64