Home > other >  pandas data.describe() will not print unless it is last entry?
pandas data.describe() will not print unless it is last entry?

Time:05-24

I have some clarification here and tried to look around but not able to find out definitively if it is the reason.

stats_df seem to only print if it is the last thing ran. does describe(data.describe()) always have to run last?

i am doing this on jupyter notebook

import pandas as pd

data = pd.read_csv('data.csv')

stats_df = data.describe()

stats_df.loc['range'] = stats_df.loc['max'] - stats_df.loc['min']

//will not print
stats_df


out_fields = ['mean','25%','50%', '75%', 'range']
stats_df = stats_df.loc[out_fields]

stats_df.rename({'50%':'median'}, inplace=True)

//will print
stats_df

CodePudding user response:

In jupyter-notebook, you'll see the output of the last statement if it's not None as the result of a cell calculation. It is stored in the dictionary Out, so you can reuse it. But it's not printing in its pure meaning. You can suppress this with a semicolon ; at the very end of your code in a cell. This is good to avoid saving data in Out and further troubles when removing them. To print something below the current cell, use display, print, pprint or other similar function.

Some methods in numpy, pandas, etc. display information on a standard output, e.g. numpy.info and pandas.DataFrame.info. Others return descriptive info without printing, e.g. pd.DataFrame.describe returns info as a data frame but it doesn't print anything. So you have to use some other function to print its output on a display. In your case, I would use display(stats_df).

  • Related