Home > Mobile >  Extract human readable memory usage for Pandas data frame
Extract human readable memory usage for Pandas data frame

Time:06-03

I have a data frame:

pd.DataFrame({'A': range(1, 10000)})

I can get a nice human-readable thing saying that it has a memory usage of 78.2 KB using df.info():

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       9999 non-null   int64
dtypes: int64(1)
memory usage: 78.2 KB

I can get an unhelpful statement with similar effect using df.memory_usage() (and this is how Pandas itself calculates its own memory usage) but would like to avoid having to roll my own. I've looked at the df.info source and traced the source of the string all the way to this line.

How is this specific string generated and how can I pull that out so I can print it to a log?

Nb I can't parse the df.info() output because it prints directly to buffer; calling str on it just returns None.

Nb This line also does not help, what is initialised is merely a boolean flag for whether memory usage should be printed at all.

CodePudding user response:

You can create an instance of pandas.io.formats.info.DataFrameInfo and read the memory_usage_string property, which is exactly what df.info() does:

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       9999 non-null   int64
dtypes: int64(1)
memory usage: 78.2 KB
>>> pd.io.formats.info.DataFrameInfo(df).memory_usage_string.strip()
'78.2 KB'

If you're passing memory_usage to df.info, you can pass it directly to DataFrameInfo:

pd.io.formats.info.DataFrameInfo(df, memory_usage='deep').memory_usage_string.strip()
  • Related