I have a data frame:
pd.DataFrame({'A': range(1, 10000)})
I can get a nice human-readable thing saying that it has a memory usage of 78.2 KB using df.info()
:
df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 9999 entries, 0 to 9998 Data columns (total 1 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 9999 non-null int64 dtypes: int64(1) memory usage: 78.2 KB
I can get an unhelpful statement with similar effect using df.memory_usage()
(and this is how Pandas itself calculates its own memory usage) but would like to avoid having to roll my own. I've looked at the df.info
source and traced the source of the string all the way to this line.
How is this specific string generated and how can I pull that out so I can print it to a log?
Nb I can't parse the df.info()
output because it prints directly to buffer; calling str
on it just returns None
.
Nb This line also does not help, what is initialised is merely a boolean flag for whether memory usage should be printed at all.
CodePudding user response:
You can create an instance of pandas.io.formats.info.DataFrameInfo
and read the memory_usage_string
property, which is exactly what df.info()
does:
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9999 entries, 0 to 9998
Data columns (total 1 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 9999 non-null int64
dtypes: int64(1)
memory usage: 78.2 KB
>>> pd.io.formats.info.DataFrameInfo(df).memory_usage_string.strip()
'78.2 KB'
If you're passing memory_usage
to df.info
, you can pass it directly to DataFrameInfo
:
pd.io.formats.info.DataFrameInfo(df, memory_usage='deep').memory_usage_string.strip()