I have some clarification here and tried to look around but not able to find out definitively if it is the reason.
stats_df seem to only print if it is the last thing ran. does describe(data.describe()) always have to run last?
i am doing this on jupyter notebook
import pandas as pd
data = pd.read_csv('data.csv')
stats_df = data.describe()
stats_df.loc['range'] = stats_df.loc['max'] - stats_df.loc['min']
//will not print
stats_df
out_fields = ['mean','25%','50%', '75%', 'range']
stats_df = stats_df.loc[out_fields]
stats_df.rename({'50%':'median'}, inplace=True)
//will print
stats_df
CodePudding user response:
In jupyter-notebook, you'll see the output of the last statement if it's not None
as the result of a cell calculation. It is stored in the dictionary Out
, so you can reuse it. But it's not printing in its pure meaning. You can suppress this with a semicolon ;
at the very end of your code in a cell. This is good to avoid saving data in Out
and further troubles when removing them. To print something below the current cell, use display
, print
, pprint
or other similar function.
Some methods in numpy, pandas, etc. display information on a standard output, e.g. numpy.info
and pandas.DataFrame.info
. Others return descriptive info without printing, e.g. pd.DataFrame.describe
returns info as a data frame but it doesn't print anything. So you have to use some other function to print its output on a display. In your case, I would use display(stats_df)
.